Python sets | Justin Joyce

Sets are one of Python’s built-in types, and they’re very useful for deduplicating and comparing collections of data. Sets have tons of useful built-in functionality, and this post covers a lot.

Here are some jump links to make life easier:

Creating a set
Check if a set contains a member
Add members to a set
Remove members from a set
Determine if a list has duplicate values
Determine the difference between sets
Finding set intersections
Supersets and subsets
Combine sets
The Frozenset class—Immutable sets

Creating a set

There are a few options:

# Create an empty set
set_one = set()

# Create a set from an existing list
set_two = set([1, 2, 3])

# Create a set with single curly brackets
set_three = {1, 2, 3}

# If you use the single bracket method, you must pass
# elements to the set. Otherwise Python will create a dict
not_a_set = {}
type(not_a_set)
# dict

Check if a set contains a member

You can check for membership with classic Python in and not in:

my_set = set([1, 2, 3])

1 in my_set
# true

1 not in my_set
# false

Add members to a set

Add members one at a time

You can add individual members to a set via set.add():

my_set = {1, 2, 3}
my_set.add(4)
print(my_set)
# {1, 2, 3, 4}

If the element you’re trying to add is already in the set, .add() will do nothing:

my_set = {1, 2, 3}
my_set.add(2)
print(my_set)
# {1, 2, 3}

Or add members in bulk

To add more than one element at once, use set.update() with a list:

my_set = {1, 2, 3}
my_set.update([4, 5])
print(my_set)
# {1, 2, 3, 4, 5}

Update, like add, will not add any duplicate values:

my_set = {1, 2, 3}
my_set.update([2, 3, 4])
print(my_set)
# {1, 2, 3, 4}

Remove members from a set

There are several options here:

set.discard(n) – removes n from the set, does nothing if n isn’t present. Returns None.
set.remove(n) – removes n from the set, raises a KeyError if n isn’t present. Returns None.
set.pop() – removes a random element of the set. Raises a KeyError if the set is already empty. Returns the element which was removed.
set.clear() – empties the entire set. Returns None

my_set = {1, 2, 3, 4, 5}
my_set.discard(3) # {1, 2, 4, 5}
my_set.discard(3) # {1, 2, 4, 5}
my_set.remove(2) # {1, 4, 5}
my_set.remove(2) # KeyError: 2

val = my_set.pop()
print(val, my_set)
# 1, {4, 5}

my_set.clear() # {}

Determine if a list has duplicate values

This comes in handy often when doing quick investigation work:

my_list = [1, 2, 3, 4, 2, 3, 6]

# Set members are always distinct
# This will automatically dedupe the list
my_set = set(my_list)

len(my_list) # 7
len(my_set) # 5

Determine the difference between sets

There are two … different ways to do this: difference and symmetric_difference.

Using set.difference()

Calling a.difference(b) will give you a new set containing the elements that are in a but not in b. Order matters here, so a.difference(b) will give different results from b.difference(a):

a = {1, 2, 3}
b = {2, 3, 4}
unique_to_a = a.difference(b)
# {1}

# To get values unique to b, switch the order
unique_to_b = b.difference(a)
# {4}

Python also gives us a shorthand for set.difference, the - sign:

a = {1, 2, 3}
b = {2, 3, 4}
unique_to_a = a - b
# {1}

Using set.symmetric_difference()

Symmetric difference between sets is defined as all elements in either set which are not in both sets. Using the same a and b:

a = {1, 2, 3}
b = {2, 3, 4}

# order doesn't matter for symmetric_difference
a.symmetric_difference(b)
# {1, 4}

# This also has a shorthand operator: ^
a ^ b
# {1, 4}

I’m not sure I’d recommend using the ^ operator here as it’s not very commonly-seen and could confuse readers of your code.

Bonus: set.isdisjoint()

This will return True if two sets have no common elements:

a = {1, 2, 3}
b = {4, 5, 6}
a.isdisjoint(b)
True

From the Python docs: Sets are disjoint if and only if their intersection is the empty set.

Bonus: compare dictionary keys

This has come in handy for me when investigating large dicts. Since a Python dict is technically an iterable, it can be passed into a set(), which is a quick way to see if two objects have the same shape:

person = {"name": "justin"}
not_a_person = {"name": "Toyota", "model_year": 2007}

# It seems obvious with these small dicts
# but when there are dozens or hundreds of keys
# this comes in handy
set_one = set(person) # {"name"}
set_two = set(not_a_person) # {"name", "model_year"}

set_one == set_two # False
set_one.symmetric_difference(set_two) # {"model_year"}

Note that above, only the dict keys are passed into the set. That’s due to the iterable nature of Python dicts—only the keys are iterated over. To get the values also, you need dict.items().

Finding set intersections

Use the very appropriately-named intersection() to get a new set containing the values common to both sets:

a = {1, 2, 3}
b = {2, 3, 4}
a.intersection(b)
# {2, 3}

# Intersection also has a shorthand operator: &
a & b
# {2, 3}

Supersets and subsets

Use set.issuperset() or set.issubset()¹:

a = {1, 2, 3}
b = {1, 2}

a.issuperset(b) # True
b.issubset(a) # True

# Order matters
a.issubset(b) # False
b.issuperset(a) # False

Combine two (or more) sets

You can use the union command to combine sets:

a = {1, 2, 3}
b = {3, 4, 5}
a.union(b)
# {1, 2, 3, 4, 5}

# This has a shorthand also: |
a | b
# {1, 2, 3, 4, 5}

Frozenset – Immutable sets

The frozenset class is a set which is immutable after it’s created. Once initialized, nothing can ever be added to or removed from a frozen set:

a = frozenset([1, 2, 3])
a.add(2)
# AttributeError: 'frozenset' object has no attribute 'add'

a.clear()
AttributeError: 'frozenset' object has no attribute 'clear'

This immutability allows frozen sets to be hashable, meaning they can be used as members of other sets or as keys in a dictionary.

Believe it or not, there are more set methods, and more shorthand operators which I didn’t cover here. To learn more, check out the official Python docs.

Notes

I’m not sure why Python broke with its usual snake_case for issuperset, issubset, and isdisjoint—it makes them harder to read / write. ↩︎