Seaborn violin plot for data exploration

Violin plot stands among the less popular visualisation techniques, largely because of its quite ambiguous character. Just like pie chart, it does not give definite numbers, but provides visual representation of possible trends in data, which aids subsequent in-depth analysis of the corpus. Because of that, violin plots are perfect tools for exploratory data analysis, preceding formation of hypotheses and analysis proper.

Code

In the following code, the examples are based on Titanic data from the Kaggle website. You can download the file also from my GitHub account.

 

The resulting plot should look as the following:

violin plot for the Titanic data
violin plot

Follow-up analysis

We can also do some research on other pieces of information from the dataset. The following code can be used to produce violin plots analysing social class (the introductory part of code changes the numeric values in ‘Pclass’ to nominal values):

This should produce the following plot:

violin plot describing relation of social class to survival rate
Violin plot

The plots are easy to produce, and offer a good insight into general relations between different fields in the dataset. Color coding in the violin plot allows the user to include additional field, described in the automatically produced legend.

Description

In the Titanic data, presented here, we can draw several careful hypotheses about the passengers. We can see, that there was relatively high survival rate among young boys, but not among young girls. We can also see, that high survival of young passengers is characteristic to the passengers from lower and middle class, but not necessarily to those from upper class.

This exploratory analysis with the use of violin plots lets us aim the analysis proper in a more precise direction, and can be very helpful at the outset of data analysis.

Rafal Kleczek

Leave a Reply