How not to write loops in Python

Beginning programmers, me included, often write code abounding with loops scanning documents time and again, which, when working with medium and large files, is not very efficient. Having suffered quite a bit because of this pattern, I would like to share a brief reflection on the ongoing struggle to make programs both simple, and more efficient.

Data

The data used here was published by Electric Reliability Council of Texas. The dataset was used in the wonderful course Data Wrangling with MongoDB on the teaching platform Udacity. You can download the entire dataset from the Udacity server. The code used here was authored by myself, which you can recognise by its general clumsiness (you can look up the original course for the solution of the course authors, which is neat and concise). I decided to take the general clumsiness, or simplicity, of my code as an advantage, used here for the sake of explanation.

The purpose of the function is to take the dataset, and to write a CSV file, with the values and dates of maximum load from the eight regions given in the dataset.

Code #1

The above code works, and returns a proper CSV file, but in a most inefficient manner. On my device, the program finished analysing the document and writing the file in 77 seconds. The file is also very difficult to read and interpret, which increases the possibility of committing mistakes.

The resulting CSV file is as follows:

The main problem with the above code, apart from its length and unreadability, is the number of times the function iterates through the entire file. While it might not be problematic in case of small files, it gets increasingly cumbersome the bigger the file gets. Here, with a file of medium size, producing even a simple output was greatly delayed.

We can solve this by avoiding the irrational use of a loop. We need to exchange the following piece of code:

with the much simplified version of it, producing the same output:

By using the  index  method, we avoid the delay in producing the output.

Code #2

The code, with the change implemented (and few other cosmetic changes could look like the following:

While the code is still far from acceptable, on my device it produces the desired output in slightly more than two seconds, which is a remarkable improvement.

Conclusion

When writing code, it is important to remember than creating a loop iterating through an entire file, while sometimes necessary, can greatly affect efficiency of the code. It is often possible to reduce the length of code, and increase its efficiency, utilising inbuilt methods instead of loops.

Rafal Kleczek

Leave a Reply