As I have been working on cleaning the OpenStreetMap of Warsaw, I needed to convert chosen information from an XML file to CSV format. While the operation in itself is rather straightforward, I find it a good opportunity to share a snippet of working code.
Beginning programmers, me included, often write code abounding with loops scanning documents time and again, which, when working with medium and large files, is not very efficient. Having suffered quite a bit because of this pattern, I would like to share a brief reflection on the ongoing struggle to make programs both simple, and more efficient.
You might wonder what might be the reason for an Excel user to take recourse to Pandas at all. It is not difficult to find one. For example, Excel is at a loss when dealing with large datasets, those exceeding one million rows. With Pandas, you can easily perform an introductory analysis of a large file, get statistics for the entire piece, and divide it into smaller chunks of data you consider to be of particular importance in case you’d like to perform some additional operations in Excel.
Violin plot stands among the less popular visualisation techniques, largely because of its quite ambiguous character. Just like pie chart, it does not give definite numbers, but provides visual representation of possible trends in data, which aids subsequent in-depth analysis of the corpus. Because of that, violin plots are perfect tools for exploratory data analysis, preceding formation of hypotheses and analysis proper.
Did you ever want to parse a CSV file in Python without using the CSV module? Wait no longer, as below is the description of a simple CSV parser, which could be handy if the simplicity of the actions you want to perform on it does not warrant importing the CSV module.