Welcome to the weekly post on data visualization! Today I would like to build on the subject I started two weeks ago, when I built a visualization presenting ethnic power relations. Here, I will build a map portraying conflict areas. The dataset I use was prepared by the Uppsala Conflict Data Program, and is available for download from its website. As the dataset is rather large, I will present here only a fraction of it pertaining to India. Continue reading “Weekly visualization: map of conflict in R”
Welcome to the new weekly post on data visualization! This week, I would like to present a simple plot displaying fragment of the data from the Ethnic Power Relations dataset. The data contains information on the power status, and power relations, between major ethnic groups in 165 countries, between the years 1946 and 2013. I used the dataset recently, writing up a research project, and I noticed some very interesting patterns emerging in data pertaining to Sri Lanka—I will use this country as an example here. I perform analysis and plotting with R, using the ggplot2 library.
Recently I did some research on the state of e-commerce in Poland, which was necessary for a position I applied for. While the project did not require presentation of any statistical or numeric data, I figured out it would be nice to attach a simple plot portraying how the field has changed over the past several years. The following is a record of my struggle.
As I have been working on cleaning the OpenStreetMap of Warsaw, I needed to convert chosen information from an XML file to CSV format. While the operation in itself is rather straightforward, I find it a good opportunity to share a snippet of working code.
A Billion Wicked Thoughts by Ogi Ogas and Sai Gaddam might not be the first book to come to mind of an analyst preparing reading list for fellow learners. The book, concerned with the mystery of human desire–and with porn-searching habits of Internet users–seems more suitable for broadly understood humanities graduates. This might be indicated not only by authors’ constant references to pop-cultural tropes, stand-up comedians, but also by almost absolute lack of statistics (which occur, in simplified form, only in notes collected at the back of the book). Yet, there is one aspect of the book which can be of much interest to analysis enthusiast: the data.
Beginning programmers, me included, often write code abounding with loops scanning documents time and again, which, when working with medium and large files, is not very efficient. Having suffered quite a bit because of this pattern, I would like to share a brief reflection on the ongoing struggle to make programs both simple, and more efficient.
You might wonder what might be the reason for an Excel user to take recourse to Pandas at all. It is not difficult to find one. For example, Excel is at a loss when dealing with large datasets, those exceeding one million rows. With Pandas, you can easily perform an introductory analysis of a large file, get statistics for the entire piece, and divide it into smaller chunks of data you consider to be of particular importance in case you’d like to perform some additional operations in Excel.
Violin plot stands among the less popular visualisation techniques, largely because of its quite ambiguous character. Just like pie chart, it does not give definite numbers, but provides visual representation of possible trends in data, which aids subsequent in-depth analysis of the corpus. Because of that, violin plots are perfect tools for exploratory data analysis, preceding formation of hypotheses and analysis proper.
Did you ever want to parse a CSV file in Python without using the CSV module? Wait no longer, as below is the description of a simple CSV parser, which could be handy if the simplicity of the actions you want to perform on it does not warrant importing the CSV module.