Welcome to the new weekly post on data visualization! This week, I would like to present a simple plot displaying fragment of the data from the Ethnic Power Relations dataset. The data contains information on the power status, and power relations, between major ethnic groups in 165 countries, between the years 1946 and 2013. I used the dataset recently, writing up a research project, and I noticed some very interesting patterns emerging in data pertaining to Sri Lanka—I will use this country as an example here. I perform analysis and plotting with R, using the ggplot2 library.
The dataset used here contains data on countries having the population of over 500,000 (in the year 1990), and high level of politicization of ethnicity. It contains a number of interesting variables. We can get a peek at the data after importing it into R:
epr <- read.csv("EPR-2014.csv")
This code should display first lines of the dataset, and some information on the different columns. We can see, for example, that there is no single column describing the period for a row of data, but two separate columns: “from” and “to”. This can be somewhat cumbersome when we need to create a plot; we might then choose to create a separate column describing the period. Fortunately, this operation is very easy in R:
epr$period <- with(epr, paste(from, to, sep = "-"))
With this dataframe, we can create a subset containing the information we are interested in, and plot it. While it is not strictly necessary to create a subset beforehand, it can be very useful if we want to perform some more operations on the data before we export the graph. In my case, I translated some columns to a different language—an operation which I did not want to perform on the entire dataset.
The following code should suffice to produce the plot:
sl <- subset(epr, statename == "Sri Lanka")
sl$group <- factor(sl$group, levels = c("Sinhalese", "Sri Lankan Tamils", "Indian Tamils", "Moors (Muslims)"))
ggplot(aes(x = group, y = size, fill = status),
data = sl) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(face = "bold", size = 10, angle = 45)) +
facet_wrap(~period, nrow = 2) +
scale_fill_brewer(type = "div") +
ylab("Proportion of the population") +
xlab("Ethnic group") +
ggtitle("Ethnic relations in Sri Lanka", subtitle = "According to the dataset 'Ethnic Power Relations'")
In the code, I used factoring to give the plot some more structure, with the largest ethnic groups coming first, and not in the middle of the graph. The visualization contains very much information (three variables per graph, and graphs divided per time periods), but is not overly difficult to read. The X and Y axes are nicely labeled, and colour indicates the power status of groups. Changes in the power status are quite easily discernible in different periods.
The construction of the plot was easy, as the country we chose has a rather straightforward ethnic composition. A similar plot constructed for USA, India or Russia would be totally illegible. In these cases, we would need to use some more elaborate filtering.