• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Justin Joyce

Practical tips and tutorials about software development.

  • Standing Invitation
  • Featured Posts
  • Latest
  • About

Reading CSVs in Python

Posted Dec 21, 2022 โ€” Updated May 15, 2024

When reading in CSVs in Python, you have two main options:

  1. Use the built-in csv module
  2. Use pandas, Python’s supreme data analysis module

Let’s go through an example of each using a spreadsheet containing some US political funding data I have around from a personal project.

Option 1: Python’s csv module:

import csv

# My CSV has column headers, so I'll be reading it into
# A list of python dicts with csv.DictReader
with open("sector_summary.csv", "r") as infile:
	csv_reader = csv.DictReader(infile)
  data = [row for row in csv_reader]
    
print(len(data))
# 6070, that's a lot of records

# let's just look at the first one
print(data[0])

"""
OrderedDict(
	  [
    	  ('sector_name', 'Agribusiness'),
        ('sectorid', 'A'),
        ('indivs', '13350'),
        ('pacs', '20500'),
        ('total', '33850'),
        ('last_updated', '06/10/2019'),
        ('cycle', '2018'),
        ('cid', 'N00030910')
    ]
)
"""

That’s it! Now you have access to the data in a list of dicts.

You might notice the type of data[0] above is OrderedDict. That’s just a fancy version of Python’s regular dict class which maintains the order of the keys, and is no longer strictly necessary. For more information, go here.

Option 2: read the csv with pandas

import pandas as pd

data = pd.read_csv("sector_summary.csv")

# pandas loads data into a `DataFrame` class, which can't be
# indexed by the usual data[0] method
print(data.head(1))
"""
    sector_name sectorid  indivs   pacs  total last_updated  cycle        cid
0  Agribusiness        A   13350  20500  33850   06/10/2019   2018  N00030910
"""

That’s it. Very simple, and now you have your data loaded into a pandas DataFrame, which has a lot of powerful functionality built into it.

Conclusion

I use Python’s csv module most of the time because my usual work consists mainly of ETL, and doesn’t require the analysis capabilities (or heavy import weight) that come with pandas. However, if you’re doing any kind of statistical analysis, pandas probably has what you need.


Helpful Links

  • Context Managers: Python’s with Statment – from this very site
  • Pandas User Guide – from pandas itself
  • csv.DictReader class – Python official docs
  • OrderedDict vs dict – RealPython

Filed Under: Python

Primary Sidebar

Recent Posts

  • Every Built-In Vim Color Scheme (with screenshots)
  • Reverse a string in Python
  • Meeting Cost Calculator
  • Vim find and replace
  • What makes an effective development team

Categories

  • Arrays (5)
  • Command Line (9)
  • Dates (3)
  • Featured (7)
  • Git (7)
  • Golang (5)
  • Javascript (8)
  • Productivity (8)
  • Projects (4)
  • Python (15)
  • Regex (2)
  • Ruby (3)
  • Shell (2)
  • Thoughts (2)
  • Tips (11)
  • Tools (3)
  • Tutorials (1)
  • Vim (4)

Archives

  • July 2024 (1)
  • February 2024 (1)
  • January 2024 (1)
  • December 2023 (1)
  • November 2023 (1)
  • October 2023 (4)
  • September 2023 (1)
  • August 2023 (2)
  • July 2023 (5)
  • June 2023 (3)
  • May 2023 (6)
  • April 2023 (5)
  • March 2023 (5)
  • February 2023 (10)
  • January 2023 (6)
  • December 2022 (7)

Copyright © 2025 ยท Contact me at justin [at] {this domain}

  • Privacy Policy