When reading in CSVs in Python, you have two main options:
- Use the built-in
csvmodule - Use
pandas, Python’s supreme data analysis module
Let’s go through an example of each using a spreadsheet containing some US political funding data I have around from a personal project.
Option 1: Python’s csv module:
import csv
# My CSV has column headers, so I'll be reading it into
# A list of python dicts with csv.DictReader
with open("sector_summary.csv", "r") as infile:
csv_reader = csv.DictReader(infile)
data = [row for row in csv_reader]
print(len(data))
# 6070, that's a lot of records
# let's just look at the first one
print(data[0])
"""
OrderedDict(
[
('sector_name', 'Agribusiness'),
('sectorid', 'A'),
('indivs', '13350'),
('pacs', '20500'),
('total', '33850'),
('last_updated', '06/10/2019'),
('cycle', '2018'),
('cid', 'N00030910')
]
)
"""That’s it! Now you have access to the data in a list of dicts.
You might notice the type of data[0] above is OrderedDict. That’s just a fancy version of Python’s regular dict class which maintains the order of the keys, and is no longer strictly necessary. For more information, go here.
Option 2: read the csv with pandas
import pandas as pd
data = pd.read_csv("sector_summary.csv")
# pandas loads data into a `DataFrame` class, which can't be
# indexed by the usual data[0] method
print(data.head(1))
"""
sector_name sectorid indivs pacs total last_updated cycle cid
0 Agribusiness A 13350 20500 33850 06/10/2019 2018 N00030910
"""That’s it. Very simple, and now you have your data loaded into a pandas DataFrame, which has a lot of powerful functionality built into it.
Conclusion
I use Python’s csv module most of the time because my usual work consists mainly of ETL, and doesn’t require the analysis capabilities (or heavy import weight) that come with pandas. However, if you’re doing any kind of statistical analysis, pandas probably has what you need.
Helpful Links
- Context Managers: Python’s
withStatment – from this very site - Pandas User Guide – from pandas itself
- csv.DictReader class – Python official docs
- OrderedDict vs dict – RealPython