Basic handling of spatial data (like GPS coordinates)

This week I am going to show you some very basic things you can do with GPS coordinates. In this post you will learn how to:

  • Read gpx files
  • Read files containing plain text coordinates (like N 48 12.123 and N 48 12′ 123″)
  • Parse characters string that contain unformatted coordinates (like above)
  • Convert minute and second format to decimal format for easier processing
  • Compute distances, angles and antipodes using the package geosphere
  • Create a basic plot of a gps track

Read gpx files

Ok, so reading them is easy, you just need an XML library. But if you want to use them for further processing it is probably best if you convert it into a data.frame. This can be done with the following method:

readTrackingFile<-function(filename) {
  require(XML)
  require(plyr)
  xmlfile<-xmlParse(filename)
  xmltop<-xmlRoot(xmlfile)
  tracking<-ldply(xmlToList(xmltop[['trk']][['trkseg']]), function(x) {
    data.frame(x)
  })
  
  tracking<-data.frame("ele"=tracking$ele[seq(1, nrow(tracking), 2)], "time"=as.character(tracking$time[seq(1, nrow(tracking), 2)]),"lat"=tracking$.attrs[seq(1, nrow(tracking), 2)], "lon"=tracking$.attrs[seq(2, nrow(tracking), 2)])
  
  tracking$ele<-as.numeric(levels(tracking$ele))[tracking$ele]
  tracking$lat<-as.numeric(levels(tracking$lat))[tracking$lat]
  tracking$lon<-as.numeric(levels(tracking$lon))[tracking$lon]
  
  x<-strptime(as.character(tracking$time), "%Y-%m-%dT%H:%M:%SZ")
  tracking$min<-60*x$hour + x$min
  
  message(paste("read", nrow(tracking), "tracking points"))
  
  return(tracking)
}

track<-readTrackingFile("data/LJWSIZUT_04_11.gpx")
head(track)

First we read the file using an XML library, then we get the root element and then use ldply (which is pretty cool!) to transform the whole thing into a data.frame. Just like that. I have to admit I did not come up with this on my own, but I do not remember where I got it from, probably stackoverflow. 'trk' and 'trkseg' are XML-elements as you probably figured out.

But so far you only get a data.frame which has a separate row for lon and lat values, that is because they are attributes. Below you see the structure of the gpx file:



  
    
      Generated by Tractive Platform Service
    
  
  
    Tracker LJWSIZUT, 2015-04-12T09:59:34+00:00 2015-04-11T22:14:13Z
    
      
        419
        
      

[...]

And here you can see the first two rows of the data.frame after ldply:

     .id ele                 time     .attrs
1  trkpt 419 2015-11-04T13:12:32Z 48.3201233
2  trkpt 419 2015-11-04T13:12:32Z 14.6258633

That is why I applied

  tracking<-data.frame("ele"=tracking$ele[seq(1, nrow(tracking), 2)], "time"=as.character(tracking$time[seq(1, nrow(tracking), 2)]),"lat"=tracking$.attrs[seq(1, nrow(tracking), 2)], "lon"=tracking$.attrs[seq(2, nrow(tracking), 2)])

which selects every other row for elevation and time, but each row for latitude and longitude.

Afterwards the method only converts the factors to numerics (more on factors) and convert the time into minutes from midnight (I will be using this data for a project and the time might be useful).

The input file I use is a tracking file from my Tractive GPS Pet Tracking device. You can download it here.

The output of the above code looks like:

  ele                 time      lat      lon min
1 419 2015-11-04T13:12:32Z 48.32012 14.62586 792
2 399 2015-11-04T13:48:51Z 48.32021 14.62588 828
3 384 2015-11-04T13:54:23Z 48.32028 14.62590 834
4 391 2015-11-04T14:58:33Z 48.32027 14.62588 898
5 422 2015-11-04T15:08:33Z 48.32006 14.62588 908
6 439 2015-11-04T15:19:01Z 48.32008 14.62632 919

Read plain text coordinates

Sometimes you might encounter coordinates like this:

48 22.123
48° 22.124'
48° 22.134
48°22.123
48°22.123'
48 22 123
48° 22' 123"
48°22'123"

First I thought this could not be a problem but R taught me otherwise. The solution that works for me is the following:

testLines<-read.table("testCoords.txt", sep=";", header=F, quote="", stringsAsFactors = FALSE)$V1
testLines

which prints:

[1] "48 22.123"     "48° 22.124'"   "48° 22.134"    "48°22.123"     "48°22.123'"   "48 22 123"     "48° 22' 123\"" "48°22'123\""
  • Why not use readLines()? The problem with readLines(), at least for me, was that it encountered and "incomplete final line found on testCoords.txt" when there is not an extra empty line after the last coordinates.
  • sep=";": This can be any separator except a space.
  • quote="": Otherwise you get problems with the second symbol ("). (stackoverflow)

The most important thing for me was to set the locale settings, because before that R always replaced the degree symbol by some ASCII representation ...

Sys.setlocale("LC_ALL", 'en_GB.UTF-8')

Otherwise not even changing the encoding of the file and using the encoding parameter worked for me!

Parse character strings

So, now we have read these unformatted gps coordinates, but we can not compute anything with them. So I wrote a parsing function, that should cover the cases that I could think of.

parseCoord<-function(coordChar) {
  if (!is.character(coordChar)) {
    stop("coord is not in character format.")
  }
  coordChar<-trim(coordChar)
  direction<-substr(coordChar, 1, 1)
  
  if ((direction %in% c("N", "E", "S", "W"))) {
    coordChar<-substr(coordChar, 2, nchar(coordChar))
    coordChar<-trim(coordChar)
  }
  
  splitted<-unlist(strsplit(coordChar, "[ °'\"]"))
  splitted<-splitted[which(splitted!="")]
  deg<-as.numeric(splitted[1])
  
  min<-NULL
  sec<-NULL
  
  if (length(splitted) > 1) { # means it also contains minutes
    min<-as.numeric(splitted[2])
    
    if (length(splitted) > 2) { # means it also contains seconds
      sec<-as.numeric(splitted[3])
    } 
  } else {
    warning ("your coordinates do not seem to contain minutes.")
  }

  return(list("dir"=direction, "deg"=deg, "min"=min, "sec"=sec))
}

This function works for formats that contain °, " and ' as symbols, but also without them. And it does not matter whether there is an additional space before/after each symbol. Giving a direction (N, S, E, W) is also optional.

Check and convert coordinates

So far we have read and parsed different coordinate formats, but we still can not compute with them. So I wrote methods to convert my rather simple coordinate format to a decimal format.

We need a little helper function which I found on stackoverflow.

trim <- function (x) gsub("^\\s+|\\s+$", "", x)

First I created two methods two check the format, so that the user can not just put anything into the conversion methods:

isSecond<-function(coord) {
  if (is.null(coord[["deg"]])) { return (FALSE) }
  if (is.null(coord[["min"]])) { return (FALSE) }
  if (is.null(coord[["sec"]])) { return (FALSE) }
  
  if (!is.numeric(coord[["deg"]])) {return (FALSE) }
  if (!is.numeric(coord[["min"]]) || coord[["min"]] > 60 || coord[["min"]] < 0) {return (FALSE) }
  if (!is.numeric(coord[["sec"]]) || coord[["sec"]] > 60 || coord[["sec"]] < 0) { return (FALSE) }
  
  return (TRUE)
}

isMinute<-function(coord) {
  if (is.null(coord[["deg"]])) { return (FALSE) }
  if (is.null(coord[["min"]])) { return (FALSE) }
  if (!is.null(coord[["sec"]])) { return (FALSE) }
  
  if (!is.numeric(coord[["deg"]])) { return (FALSE) }
  if (!is.numeric(coord[["min"]]) || coord[["min"]] > 60 || coord[["min"]] < 0) { 
    return (FALSE) 
  }
  return (TRUE)
}

The conversion functions are a lot shorter and pretty simple:

min2dec<-function(coord) {
  
  if (!isMinute(coord)) {
    stop("coord is not in minute format.")
  }  
  dec <- coord$deg + coord$min / 60
  return (dec)
}

sec2dec<-function(coord) {
  if (!isSecond(coord)) {
    stop("coord is not in second format.")
  }  
  dec <- coord$deg + coord$min / 60 + coord$sec / 3600 
  return (dec)
}

Compute distances, angles and antipodes

Now we can finally compute things.

There are basically two ways how to compute distances on earth ("Great circle" and "Rhumb line"), but this would be enough information for another post. For now we are going to focus on "Great circle" or "as the crow flies" (german: Fluglinie).

The package geosphere introduces three different methods for computing distances along the "great circle". As an example we compute the distance between my cat (the first tracking point) and our house.

distMeeus(c(track$lon[1], track$lat[1]), c(house$lon, house$lat))

The result is:
[1] 12.91136

distCosine(c(track$lon[1], track$lat[1]), c(house$lon, house$lat))

The result is:
[1] 12.92392

distHaversine(c(track$lon[1], track$lat[1]), c(house$lon, house$lat))

The result is:
[1] 12.92394

The results are in meters and we can see that there are already some differences. This is because the methods distCosine() and distHavarsine() assume a spherical earth and distMeeus() assumes an ellipsoid earth which is a lot more accurate.

To see a bigger distance I computed the distance between the northpole and my house (in km):

northpol<-list("lat"=90, "lon"=0)
distMeeus(c(northpol$lon, northpol$lat), c(house$lon, house$lat))/1000

The result is:
4647.923

distCosine(c(northpol$lon, northpol$lat), c(house$lon, house$lat))/1000

The result is:
4639.77

distHaversine(c(northpol$lon, northpol$lat), c(house$lon, house$lat))/1000

4639.77

Interestingly, the first method now computes a longer distance than the two other methods (in contrast to the cat-house example) and the two spherical methods even have the same result.

So, now I know how far I am away from the northpole and more important, from my cat. The next thing I need to find out is in what direction I have to walk to get to her:

bearing(c(house$lon, house$lat), c(track$lon[1], track$lat[1]))

The result is:
168.3352

Here it is important to use the starting point as the first parameter!

What can also easily be computed with the geosphere package is the antipode of a point.

antipode(c(house$lon, house$lat))

The result is:
-165.3742 -48.32024

You might figure out that you simply have to switch the sign for the latitude and you have to distract 180 for the longitude 🙂

Create a basic plot of a gps track

Finally my favourite part, plotting!

First, lets create a plot of my cats tracking points (marking the first one green and the last one orange):

plot(track$lon, track$lat, pch=19, cex=0.5, col=c("green", rep("black", length(track$lon)-2), "orange"))

Add the position of my house, to see roughly where my cat likes to go:

points(house$lon, house$lat, col="blue", pch=19)

Add lines to connect the dots:

lines(track$lon, track$lat)

The next thing is just a little playing around, I added the distance and the bearing to each datapoint:

for (i in 1:length(track$lon)) {
  d<-round(distMeeus(c(track$lon[i], track$lat[i]), c(house$lon, house$lat)), digits=0)  
  if (d > 25) {
    a<-round(bearing(c(track$lon[i], track$lat[i]), c(house$lon, house$lat)), digits=2) 
    text(track$lon[i], track$lat[i] + 0.00005 , labels=paste(d, "m", a), col="red", cex=0.7)
  }
}

And that is the final plot:

Stalking my cat (vh)
Stalking my cat (vh)

And here you can download the complete source used in this tutorial.

Leave a Reply

Your email address will not be published. Required fields are marked *