When reading in CSVs in Python, you have two main options:
- Use the built-in
csv
module - Use
pandas
, Python’s supreme data analysis module
Let’s go through an example of each using a spreadsheet containing some US political funding data I have around from a personal project.
Option 1: Python’s csv
module:
import csv
# My CSV has column headers, so I'll be reading it into
# A list of python dicts with csv.DictReader
with open("sector_summary.csv", "r") as infile:
csv_reader = csv.DictReader(infile)
data = [row for row in csv_reader]
print(len(data))
# 6070, that's a lot of records
# let's just look at the first one
print(data[0])
"""
OrderedDict(
[
('sector_name', 'Agribusiness'),
('sectorid', 'A'),
('indivs', '13350'),
('pacs', '20500'),
('total', '33850'),
('last_updated', '06/10/2019'),
('cycle', '2018'),
('cid', 'N00030910')
]
)
"""
That’s it! Now you have access to the data in a list of dicts.
You might notice the type
of data[0]
above is OrderedDict
. That’s just a fancy version of Python’s regular dict
class which maintains the order of the keys, and is no longer strictly necessary. For more information, go here.
Option 2: read the csv with pandas
import pandas as pd
data = pd.read_csv("sector_summary.csv")
# pandas loads data into a `DataFrame` class, which can't be
# indexed by the usual data[0] method
print(data.head(1))
"""
sector_name sectorid indivs pacs total last_updated cycle cid
0 Agribusiness A 13350 20500 33850 06/10/2019 2018 N00030910
"""
That’s it. Very simple, and now you have your data loaded into a pandas DataFrame
, which has a lot of powerful functionality built into it.
Conclusion
I use Python’s csv
module most of the time because my usual work consists mainly of ETL, and doesn’t require the analysis capabilities (or heavy import weight) that come with pandas
. However, if you’re doing any kind of statistical analysis, pandas
probably has what you need.
Helpful Links
- Context Managers: Python’s
with
Statment – from this very site - Pandas User Guide – from pandas itself
- csv.DictReader class – Python official docs
- OrderedDict vs dict – RealPython