This is the third post in my dataset series. The first part gave a more general overview on where to get data. In the second post I listed sources for sports, movies, music and books data. This section will give you information on how to get weather, public/governmental data and how to find GIS data.
Over the next couple of weeks you’ll find these posts on my blog:
- General data sources
- TV, music, book ratings and sports data
- Weather, geographical and government data (this post)
- Special data sources (that will be different data sources I found interesting but did not fit any of the above categories)
Some times it can be useful to have weather data as an additional input for a model (e.g. when analysing behaviour of animals). Several web sites provide such data (either as API or download). Not all data is free (one website even calls themselves “open…” and the first thing you find is their price list, I don’t include them out of protest). Most also provide a search interface that can be used via a web browser.
Currently I am reading the book The Signal and the Noise by Nate Silver which has really, really interesting insights about different topics that can be predicted or not and also a chapter about weather forecasts.
He mentions these services:
- National Weather Service: The NWS provides a Data Portal from which data in different formats can be downloaded.
- AccuWeather.com: On this website you can find weather forecasts, satellite image, … and they also provide premium services.
- weather.com: provides forecasts, radar and satellite images and also rather catchy sounding news articles related to weather.
These are some of the links I additionally found:
- Data Access @ National Centers for Environmental Information: They provide detailed weather, wind and other related data (hourly, daily, monthly, …). Some datasets focus on the US but also global datasets are available. (For some data you might have to pay)
- wunderground.com provide some services for free (Geolookup, current condition, astronomy, …).
- weather.org provides a search interface and looks into local weather histories for the given location and time. I don’t think it has an API or data can be downloaded.
The list would be endless, most web sites (except for the first one in my list) are not really nice implemented and look kind of shady.
I also found an R package: weatherData | R-package. At first I was excited but unfortunately currently there are not many datasets available and it seems to be a wrapper for wunderground.com (the second link in the list).
If you are from Austria like me, there is a website by the Zentralanstalt für Meteorologie und Geodynamik from where you can buy historical weather data.
The most promising seems to be the first in the list, but you need to be lucky that it contains exactly the data you need.
Geo (or more professionally GIS: Geographic Information System) data should be the dream for every data nerd. Google immediately finds a list on wikipedia containing lots of GIS data sources. Since it does not make sense to copy the whole list or try to add anything, I will try to explain different types of GIS data and focus on data sources specific for Austria.
There is a great overview I found. The main differentiation you need to make is vector and raster data. Vector data can be used e.g. for points, lines, polygons and labels. The type raster is rather used for temperature, air pressure and such things. Tools usually can deal with both of course. Here you can also find information, e.g. for what points, lines (streets), polygons (land use) can be used. On wikipedia you find an overview of GIS file formats (there are several types for storing vector and raster data). One that is not in the list, but which I regularly use is the GPS eXchange format (GPX) for exchanging waypoints, routes, …
To get better resolution you should look for maps created by your government or other parties interested in creating such. For example for Austria a great map is basemap.at which has very high resolution and is available in 5 different variants. Such maps can be used as input for tools like QGIS.
A very long list of different maps can be found here.
GIS data can also be tracks or routes. A route is a set of points that help you navigate to a target. The point might know the bearing (angle) and distance to the next point. A track is more like the history from where you were, depending on your tracking device, these coordinates could be taken at a certain time interval (every 15 seconds). Tracks/routes can be saved in the above mentioned GPX files.
Track data is something that you can collect from sports bands, your mobile phone (e.g. using the app ViewRanger which also allows you to convert your track to a route which is pretty cool), or actual tracking devices, like the Tractive Pet Tracker (there are some companies out there but this is the one I have). I published the data collected by my cats and I already wrote review about the device.
Public data like transport, census, health information, employment information, … can be very useful for people to perform analyses and to create services that can be used by them and others. Governments all over the world already have noticed this and provide more and more data for free to the public. Governments and companies also already invest in open data challenges and encourage programmers and open data enthusiasts to create mobile apps e.g. for public transport (e.g. #EULife, QLIK Challenge, another list of competitions, …).
To find public data from your government you just need to search for “open data [your country/city]”. You’ll probably find openly available datasets and APIs that can be used for free.
I’ve performed this search for Austria, Germany and the US and was immediately able to compile a list of free services:
- Open Data Austria | www.data.gv.at
- Open Data US | usopendata.org
- Open Government US | www.data.gov
- Open Data Germany | www.govdata.de
- Open Data Community Germany | offenedaten.de
The OECD (Organisation for Economic Co-operation and Development) also provides data from many different countries.
Finding geographic or governmental data is a lot easier than finding data related to media (music, movies) or sports. Sports data is also available but the ones who own it (mostly TV channels) want to make money with it. As a user commented on my other post it is because of privacy concerns that e.g. the Netflix Prize data set is not available anymore.
In fact, it is so easy that there are so many lists of data sources available that one can be overwhelmed by the possibilities, but the chance is that if you are looking for something very specific, that you will also find it.