• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Justin Joyce

Practical tips and tutorials about software development.

  • Standing Invitation
  • Featured Posts
  • Latest
  • About

Python sets

Posted Jul 30, 2023 — Updated Jan 10, 2024

Sets are one of Python’s built-in types, and they’re very useful for deduplicating and comparing collections of data. Sets have tons of useful built-in functionality, and this post covers a lot.

Here are some jump links to make life easier:

  • Creating a set
  • Check if a set contains a member
  • Add members to a set
  • Remove members from a set
  • Determine if a list has duplicate values
  • Determine the difference between sets
  • Finding set intersections
  • Supersets and subsets
  • Combine sets
  • The Frozenset class—Immutable sets

Creating a set

There are a few options:

# Create an empty set
set_one = set()

# Create a set from an existing list
set_two = set([1, 2, 3])

# Create a set with single curly brackets
set_three = {1, 2, 3}

# If you use the single bracket method, you must pass
# elements to the set. Otherwise Python will create a dict
not_a_set = {}
type(not_a_set)
# dict

Check if a set contains a member

You can check for membership with classic Python in and not in:

my_set = set([1, 2, 3])

1 in my_set
# true

1 not in my_set
# false

Add members to a set

Add members one at a time

You can add individual members to a set via set.add():

my_set = {1, 2, 3}
my_set.add(4)
print(my_set)
# {1, 2, 3, 4}

If the element you’re trying to add is already in the set, .add() will do nothing:

my_set = {1, 2, 3}
my_set.add(2)
print(my_set)
# {1, 2, 3}

Or add members in bulk

To add more than one element at once, use set.update() with a list:

my_set = {1, 2, 3}
my_set.update([4, 5])
print(my_set)
# {1, 2, 3, 4, 5}

Update,  like add, will not add any duplicate values:

my_set = {1, 2, 3}
my_set.update([2, 3, 4])
print(my_set)
# {1, 2, 3, 4}

Remove members from a set

There are several options here:

  1. set.discard(n) – removes n from the set, does nothing if n isn’t present. Returns None.
  2. set.remove(n) – removes n from the set, raises a KeyError if n isn’t present. Returns None.
  3. set.pop() – removes a random element of the set. Raises a KeyError if the set is already empty. Returns the element which was removed.
  4. set.clear() – empties the entire set. Returns None
my_set = {1, 2, 3, 4, 5}
my_set.discard(3) # {1, 2, 4, 5}
my_set.discard(3) # {1, 2, 4, 5}
my_set.remove(2) # {1, 4, 5}
my_set.remove(2) # KeyError: 2

val = my_set.pop()
print(val, my_set)
# 1, {4, 5}

my_set.clear() # {}

Determine if a list has duplicate values

This comes in handy often when doing quick investigation work:

my_list = [1, 2, 3, 4, 2, 3, 6]

# Set members are always distinct
# This will automatically dedupe the list
my_set = set(my_list)

len(my_list) # 7
len(my_set) # 5

Determine the difference between sets

There are two … different ways to do this: difference and symmetric_difference.

Using set.difference()

Calling a.difference(b) will give you a new set containing the elements that are in a but not in b. Order matters here, so a.difference(b) will give different results from b.difference(a):

a = {1, 2, 3}
b = {2, 3, 4}
unique_to_a = a.difference(b)
# {1}

# To get values unique to b, switch the order
unique_to_b = b.difference(a)
# {4}

Python also gives us a shorthand for set.difference, the - sign:

a = {1, 2, 3}
b = {2, 3, 4}
unique_to_a = a - b
# {1}

Using set.symmetric_difference()

Symmetric difference between sets is defined as all elements in either set which are not in both sets. Using the same a and b:

a = {1, 2, 3}
b = {2, 3, 4}

# order doesn't matter for symmetric_difference
a.symmetric_difference(b)
# {1, 4}

# This also has a shorthand operator: ^
a ^ b
# {1, 4}

I’m not sure I’d recommend using the ^ operator here as it’s not very commonly-seen and could confuse readers of your code.

Bonus: set.isdisjoint()

This will return True if two sets have no common elements:

a = {1, 2, 3}
b = {4, 5, 6}
a.isdisjoint(b)
True

From the Python docs: Sets are disjoint if and only if their intersection is the empty set.

Bonus: compare dictionary keys

This has come in handy for me when investigating large dicts. Since a Python dict is technically an iterable, it can be passed into a set(), which is a quick way to see if two objects have the same shape:

person = {"name": "justin"}
not_a_person = {"name": "Toyota", "model_year": 2007}

# It seems obvious with these small dicts
# but when there are dozens or hundreds of keys
# this comes in handy
set_one = set(person) # {"name"}
set_two = set(not_a_person) # {"name", "model_year"}

set_one == set_two # False
set_one.symmetric_difference(set_two) # {"model_year"}

Note that above, only the dict keys are passed into the set. That’s due to the iterable nature of Python dicts—only the keys are iterated over. To get the values also, you need dict.items().

Finding set intersections

Use the very appropriately-named intersection() to get a new set containing the values common to both sets:

a = {1, 2, 3}
b = {2, 3, 4}
a.intersection(b)
# {2, 3}

# Intersection also has a shorthand operator: &
a & b
# {2, 3}

Supersets and subsets

Use set.issuperset() or set.issubset()1:

a = {1, 2, 3}
b = {1, 2}

a.issuperset(b) # True
b.issubset(a) # True

# Order matters
a.issubset(b) # False
b.issuperset(a) # False

Combine two (or more) sets

You can use the union command to combine sets:

a = {1, 2, 3}
b = {3, 4, 5}
a.union(b)
# {1, 2, 3, 4, 5}

# This has a shorthand also: |
a | b
# {1, 2, 3, 4, 5}

Frozenset – Immutable sets

The  frozenset class is a set which is immutable after it’s created. Once initialized, nothing can ever be added to or removed from a frozen set:

a = frozenset([1, 2, 3])
a.add(2)
# AttributeError: 'frozenset' object has no attribute 'add'

a.clear()
AttributeError: 'frozenset' object has no attribute 'clear'

This immutability allows frozen sets to be hashable, meaning they can be used as members of other sets or as keys in a dictionary.

More

Believe it or not, there are more set methods, and more shorthand operators which I didn’t cover here. To learn more, check out the official Python docs.


Notes

  1. I’m not sure why Python broke with its usual snake_case for issuperset, issubset, and isdisjoint—it makes them harder to read / write. ↩︎

Filed Under: Python

Primary Sidebar

Recent Posts

  • Every Built-In Vim Color Scheme (with screenshots)
  • Reverse a string in Python
  • Meeting Cost Calculator
  • Vim find and replace
  • What makes an effective development team

Categories

  • Arrays (5)
  • Command Line (9)
  • Dates (3)
  • Featured (7)
  • Git (7)
  • Golang (5)
  • Javascript (8)
  • Productivity (8)
  • Projects (4)
  • Python (15)
  • Regex (2)
  • Ruby (3)
  • Shell (2)
  • Thoughts (2)
  • Tips (11)
  • Tools (3)
  • Tutorials (1)
  • Vim (4)

Archives

  • July 2024 (1)
  • February 2024 (1)
  • January 2024 (1)
  • December 2023 (1)
  • November 2023 (1)
  • October 2023 (4)
  • September 2023 (1)
  • August 2023 (2)
  • July 2023 (5)
  • June 2023 (3)
  • May 2023 (6)
  • April 2023 (5)
  • March 2023 (5)
  • February 2023 (10)
  • January 2023 (6)
  • December 2022 (7)

Copyright © 2025 · Contact me at justin [at] {this domain}

  • Privacy Policy