This graph maps some of the international flight routes onto a worldmap background with some “lighting” effect.
I did this plot simply because it looks so cool to me. It demonstrates how powerful R (and ggplot) can be when visualizing data. The original blog, however, does not have codes or data that you can copy & paste, so I created this tutorial to give a detailed implementation to make that easier. :)
(all codes and processed data that I used can be found in my github)
Ok. Let’s get to the point.
1) Background

You will need to create a world-map background where you can add anything on top of it.
However, in order to get the background with the “lighting” effect (like above), you will need to download and use the urban data from this link. After downloading the data, put the “ne_10m_urban_areas.shp” in your directory and execute the lines below should give you such a “World at Night” background:
library(ggplot2) library(maps) library(rgeos) library(maptools) library(ggmap) library(geosphere) library(plyr) urbanareasin <- readShapePoly("ne_10m_urban_areas.shp") worldmap = map_data ("world") wrld<-c(geom_polygon(aes(long,lat,group=group), size = 0.1, colour= "#090D2A", fill="#090D2A", alpha=0.8, data=worldmap)) urb <- c(geom_polygon(aes(long, lat, group = group), size = 0.3, color = "#ffffff", fill = "#ffffff", alpha = 1, data = urbanareasin)) ggplot() + wrld + urb + theme(panel.background = element_rect(fill='#00001C',colour='#00001C'), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + annotate("text",x=max(worldmap$long),y=-60,hjust=.9,size=3, label=paste("World at night!","Created by Weimin Wang",sep="\n"),color="white")
2) Routes
2.1) Data Processing
The next steps will require a bit of data processing. You can download the routes.dat data from OpenFlights. However, the raw data will look like this:
2B | 410 | AER | 2965 | KZN | 2990 | 0 | CR2 | |
2B | 410 | ASF | 2966 | KZN | 2990 | 0 | CR2 | |
2B | 410 | ASF | 2966 | MRV | 2962 | 0 | CR2 | |
2B | 410 | CEK | 2968 | KZN | 2990 | 0 | CR2 | |
2B | 410 | CEK | 2968 | OVB | 4078 | 0 | CR2 | |
2B | 410 | DME | 4029 | KZN | 2990 | 0 | CR2 |
You will only need source and destination for each route in order to plot them on the map, so just keep the 4th and 6th columns as below .
Source Airport ID | Destination Airport ID |
2966 | 2990 |
2966 | 2962 |
2968 | 2990 |
2968 | 4078 |
4029 | 2990 |
Next is to convert the airport IDs to countries or cities. What you need to do is simply download the airports.dat file from the same link, and use it as dictionary to look up for the country names for each of the 4-digit airport identifiers. The top 3 lines of the airports.dat file look like this:
1 | Goroka | Goroka | Papua New Guinea | GKA | AYGA | -6.08169 | 145.3919 | 5282 | 10 | U | Pacific/Port_Moresby |
2 | Madang | Madang | Papua New Guinea | MAG | AYMD | -5.20708 | 145.7887 | 20 | 10 | U | Pacific/Port_Moresby |
3 | Mount Hagen | Mount Hagen | Papua New Guinea | HGU | AYMH | -5.82679 | 144.2959 | 5388 | 10 | U | Pacific/Port_Moresby |
You do not need all the information. Just keep the relevant cols (1st and 4th cols if you want to convert to countries), and delete the rest as below:
Airport ID | Country |
1 | Papua New Guinea |
2 | Papua New Guinea |
3 | Papua New Guinea |
4 | Papua New Guinea |
… | |
3364 | China |
Now you have the dictionary file with you. Next you will need to write a simple python or R scripts (or in excel) to convert each 4-digit identifier in your routes file to a country name. The converted file will look something like this below (If you want to skip all the data processing steps, simply go to my github to download and use my processed data):
Source Country | Destination Country |
China | United States |
Singapore | United Kingdom |
Russia | Russia |
(You may wish to consider deleting rows that have the same source and destination countries (domestic flights). This will substantially reduce the lines of the data and reduce processing time as well)
The final step is to convert each of the country names to a pair of (lat, long). You can do this by first using R to get (lat, long) pairs for all distinct country names (around 226 of them), and use it as a dictionary to process the above. (To get (lat, long) for China for example, simply use function geocode(“China”) which will return lat long values)
country | lat | lon |
Russia | 61.52401 | 105.3188 |
Azerbaijan | 40.14311 | 47.57693 |
Peru | -9.18997 | -75.0152 |
…… |
Once you have translated each country to its (lat, long) pair, you might have something like this below:
count | from | to | lat1 | lon1 | lat2 | lon2 |
1 | Afghanistan | Azerbaijan | 33.93911 | 67.70995 | 40.14311 | 47.57693 |
7 | Afghanistan | India | 33.93911 | 67.70995 | 20.59368 | 78.96288 |
5 | Afghanistan | Iran | 33.93911 | 67.70995 | 32.42791 | 53.68805 |
1 | Afghanistan | Kuwait | 33.93911 | 67.70995 | 29.31166 | 47.48177 |
4 | Afghanistan | Pakistan | 33.93911 | 67.70995 | 30.37532 | 69.34512 |
1 | Afghanistan | Tajikistan | 33.93911 | 67.70995 | 38.86103 | 71.27609 |
3 | Afghanistan | Turkey | 33.93911 | 67.70995 | 38.96375 | 35.24332 |
This will be your final data you will need for plotting (routes_all.csv). (“count” col refers to number of times that corresponding route appears in the data; and (lat2, lon2) refers to Destination country). And it has been reduced to 4470 rows from its originally around 64700 rows.
You can plot all the routes on the map, but it will rather look very clustered. Instead, you can select only countries that you are interested to plot. For example, you can select top 10 countries based on total counts.
total count | country |
2380 | United States |
2326 | United Kingdom |
1899 | Spain |
1833 | Germany |
1355 | France |
1299 | Italy |
1105 | China |
809 | Russia |
725 | United Arab Emirates |
678 | Canada |
plot_countries = c('United States', 'United Kingdom', 'Spain', 'Germany', 'France', 'Italy','China', 'Russia', 'United Arab Emirates', 'Canada')
2.2) Plot
With all the data, you can now execute the scripts below to do the plotting for you. (provided routes_all.csv in your directory)
library(ggplot2) library(maps) library(rgeos) library(maptools) library(ggmap) library(geosphere) library(plyr) setwd('\your\directory') rm(list = ls()) fortify.SpatialLinesDataFrame = function(model, data, ...){ ldply(model@lines, fortify) } df = read.csv('routes_all.csv', stringsAsFactor=F) plot_countries = c('United States', 'United Kingdom', 'Spain', 'Germany', 'France', 'Italy','China', 'Singapore') df = df[df$from %in% plot_countries,] # calculate routes for each row routes = gcIntermediate(df[,c('lon1', 'lat1')], df[,c('lon2', 'lat2')], 200, breakAtDateLine = FALSE, addStartEnd = TRUE, sp=TRUE) # fortify to dataframe fortifiedroutes = fortify.SpatialLinesDataFrame(routes) # merge to form great circles routes_count = data.frame('count'=df$count, 'id'=1:nrow(df), 'Countries'=df$from) greatcircles = merge(fortifiedroutes, routes_count, all.x=T, by='id') # get worldmap worldmap = map_data ("world") # wrld layer wrld<-c(geom_polygon(aes(long,lat,group=group), size = 0.1, colour= "#090D2A", fill="#090D2A", alpha=0.8, data=worldmap)) # urban layer urbanareasin <- readShapePoly("ne_10m_urban_areas.shp") urb <- c(geom_polygon(aes(long, lat, group = group), size = 0.3, color = "#ffffff", fill = "#ffffff", alpha = 0.8, data = urbanareasin)) # final combine ggplot() + wrld + urb + geom_line(aes(long,lat,group=id, color=Countries), alpha = 0.3, size=0.01, data= greatcircles) + theme(panel.background = element_rect(fill='#00001C',colour='#00001C'), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(legend.position = c(0,0.4), legend.justification = c(0,1), legend.background = element_rect(colour = NA, fill = NA), legend.key = element_rect(colour = NA, fill = NA, size = 10), legend.text = element_text(colour='white', size = 20)) + #guides(fill = guide_legend(keywidth = 20, keyheight = 20)) + annotate("text",x=max(worldmap$long),y=-60,hjust=.9,size=6, label=paste("Flight routes from top 8 countries","Created by Your name","Data From OpenFlights.org", sep="\n"),color="white")
Leave a comment if you have more interesting ways to do this, :)
Data used are from
http://openflights.org/data.html and http://www.naturalearthdata.com
Reblogged this on .
LikeLike