You might wonder what might be the reason for an Excel user to take recourse to Pandas at all. It is not difficult to find one. For example, Excel is at a loss when dealing with large datasets, those exceeding one million rows. With Pandas, you can easily perform an introductory analysis of a large file, get statistics for the entire piece, and divide it into smaller chunks of data you consider to be of particular importance in case you’d like to perform some additional operations in Excel.
Being able to automate tasks in Python can also be helpful in situations, when quick analysis of a large corpus of similar files is necessary. While such a task can be accomplished by recording a macro (or writing a script in VBA), automated code in Python is bound to accomplish the work much faster (some claim Pandas is as fast as C, though it might be just another urban legend).
In this post, I will be working on the example file ‘Indian Wholesale Price Index’ from the year 2005-2006, which you can download from my GitHub account.
Opening an Excel file in Pandas is easy. The following code should do the trick:
import pandas as pd
# Read the dataset:
df = pd.read_excel('yearlyf.xls')
# Get the description of the data:
# Get first five rows of the data:
Here is the output from the file I am working on:
Knowing the basic information about the file, we may wish to choose the information of our particular interest. Let us assume we would like to save a separate file, which comprises only the first five columns of the original file. This operation can be accomplished very easily:
df2 = df.loc[:, 'COMM_CODE': 'IN20072008']
The code produces another Excel file in the location from which we run Python, containing only the information of our interest.
Simple operations on Excel files in Pandas can be of much help in daily work with large datasets. Pandas will accomplish the desired actions much faster than Excel or other editors, and can read much larger files. It is a great alternative to other popular methods of performing repetitive tasks in Excel, like recording macros or writing scripts in VBA.