Sets are one of Python’s built-in types, and they’re very useful for deduplicating and comparing collections of data. Sets have tons of useful built-in functionality, and this post covers a lot.
Here are some jump links to make life easier:
- Creating a set
- Check if a set contains a member
- Add members to a set
- Remove members from a set
- Determine if a list has duplicate values
- Determine the difference between sets
- Finding set intersections
- Supersets and subsets
- Combine sets
- The Frozenset class—Immutable sets
Creating a set
There are a few options:
# Create an empty set
set_one = set()
# Create a set from an existing list
set_two = set([1, 2, 3])
# Create a set with single curly brackets
set_three = {1, 2, 3}
# If you use the single bracket method, you must pass
# elements to the set. Otherwise Python will create a dict
not_a_set = {}
type(not_a_set)
# dict
Check if a set contains a member
You can check for membership with classic Python in
and not in
:
my_set = set([1, 2, 3])
1 in my_set
# true
1 not in my_set
# false
Add members to a set
Add members one at a time
You can add individual members to a set via set.add()
:
my_set = {1, 2, 3}
my_set.add(4)
print(my_set)
# {1, 2, 3, 4}
If the element you’re trying to add is already in the set, .add()
will do nothing:
my_set = {1, 2, 3}
my_set.add(2)
print(my_set)
# {1, 2, 3}
Or add members in bulk
To add more than one element at once, use set.update()
with a list:
my_set = {1, 2, 3}
my_set.update([4, 5])
print(my_set)
# {1, 2, 3, 4, 5}
Update, like add, will not add any duplicate values:
my_set = {1, 2, 3}
my_set.update([2, 3, 4])
print(my_set)
# {1, 2, 3, 4}
Remove members from a set
There are several options here:
set.discard(n)
– removesn
from the set, does nothing ifn
isn’t present. ReturnsNone
.set.remove(n)
– removesn
from the set, raises aKeyError
ifn
isn’t present. ReturnsNone
.set.pop()
– removes a random element of the set. Raises aKeyError
if the set is already empty. Returns the element which was removed.set.clear()
– empties the entire set. ReturnsNone
my_set = {1, 2, 3, 4, 5}
my_set.discard(3) # {1, 2, 4, 5}
my_set.discard(3) # {1, 2, 4, 5}
my_set.remove(2) # {1, 4, 5}
my_set.remove(2) # KeyError: 2
val = my_set.pop()
print(val, my_set)
# 1, {4, 5}
my_set.clear() # {}
Determine if a list has duplicate values
This comes in handy often when doing quick investigation work:
my_list = [1, 2, 3, 4, 2, 3, 6]
# Set members are always distinct
# This will automatically dedupe the list
my_set = set(my_list)
len(my_list) # 7
len(my_set) # 5
Determine the difference between sets
There are two … different ways to do this: difference
and symmetric_difference
.
Using set.difference()
Calling a.difference(b)
will give you a new set containing the elements that are in a
but not in b
. Order matters here, so a.difference(b)
will give different results from b.difference(a)
:
a = {1, 2, 3}
b = {2, 3, 4}
unique_to_a = a.difference(b)
# {1}
# To get values unique to b, switch the order
unique_to_b = b.difference(a)
# {4}
Python also gives us a shorthand for set.difference, the -
sign:
a = {1, 2, 3}
b = {2, 3, 4}
unique_to_a = a - b
# {1}
Using set.symmetric_difference()
Symmetric difference between sets is defined as all elements in either set which are not in both sets. Using the same a
and b
:
a = {1, 2, 3}
b = {2, 3, 4}
# order doesn't matter for symmetric_difference
a.symmetric_difference(b)
# {1, 4}
# This also has a shorthand operator: ^
a ^ b
# {1, 4}
I’m not sure I’d recommend using the ^
operator here as it’s not very commonly-seen and could confuse readers of your code.
Bonus: set.isdisjoint()
This will return True
if two sets have no common elements:
a = {1, 2, 3}
b = {4, 5, 6}
a.isdisjoint(b)
True
From the Python docs: Sets are disjoint if and only if their intersection is the empty set.
Bonus: compare dictionary keys
This has come in handy for me when investigating large dicts. Since a Python dict
is technically an iterable, it can be passed into a set()
, which is a quick way to see if two objects have the same shape:
person = {"name": "justin"}
not_a_person = {"name": "Toyota", "model_year": 2007}
# It seems obvious with these small dicts
# but when there are dozens or hundreds of keys
# this comes in handy
set_one = set(person) # {"name"}
set_two = set(not_a_person) # {"name", "model_year"}
set_one == set_two # False
set_one.symmetric_difference(set_two) # {"model_year"}
Note that above, only the dict keys are passed into the set. That’s due to the iterable nature of Python dicts—only the keys are iterated over. To get the values also, you need dict.items().
Finding set intersections
Use the very appropriately-named intersection()
to get a new set containing the values common to both sets:
a = {1, 2, 3}
b = {2, 3, 4}
a.intersection(b)
# {2, 3}
# Intersection also has a shorthand operator: &
a & b
# {2, 3}
Supersets and subsets
Use set.issuperset()
or set.issubset()
1:
a = {1, 2, 3}
b = {1, 2}
a.issuperset(b) # True
b.issubset(a) # True
# Order matters
a.issubset(b) # False
b.issuperset(a) # False
Combine two (or more) sets
You can use the union
command to combine sets:
a = {1, 2, 3}
b = {3, 4, 5}
a.union(b)
# {1, 2, 3, 4, 5}
# This has a shorthand also: |
a | b
# {1, 2, 3, 4, 5}
Frozenset – Immutable sets
The frozenset
class is a set
which is immutable after it’s created. Once initialized, nothing can ever be added to or removed from a frozen set:
a = frozenset([1, 2, 3])
a.add(2)
# AttributeError: 'frozenset' object has no attribute 'add'
a.clear()
AttributeError: 'frozenset' object has no attribute 'clear'
This immutability allows frozen sets to be hashable, meaning they can be used as members of other sets or as keys in a dictionary.
More
Believe it or not, there are more set
methods, and more shorthand operators which I didn’t cover here. To learn more, check out the official Python docs.
Notes
- I’m not sure why Python broke with its usual snake_case for
issuperset
,issubset
, andisdisjoint
—it makes them harder to read / write. ↩︎