Sets are one of Python’s built-in types, and they’re very useful for deduplicating and comparing collections of data. Sets have tons of useful built-in functionality, and this post covers a lot.

Here are some jump links to make life easier:

- Creating a set
- Check if a set contains a member
- Add members to a set
- Remove members from a set
- Determine if a list has duplicate values
- Determine the difference between sets
- Finding set intersections
- Supersets and subsets
- Combine sets
- The Frozenset class—Immutable sets

## Creating a set

There are a few options:

```
# Create an empty set
set_one = set()
# Create a set from an existing list
set_two = set([1, 2, 3])
# Create a set with single curly brackets
set_three = {1, 2, 3}
# If you use the single bracket method, you must pass
# elements to the set. Otherwise Python will create a dict
not_a_set = {}
type(not_a_set)
# dict
```

## Check if a set contains a member

You can check for membership with classic Python `in`

and `not in`

:

```
my_set = set([1, 2, 3])
1 in my_set
# true
1 not in my_set
# false
```

## Add members to a set

### Add members one at a time

You can add individual members to a set via `set.add()`

:

```
my_set = {1, 2, 3}
my_set.add(4)
print(my_set)
# {1, 2, 3, 4}
```

If the element you’re trying to add is already in the set, `.add()`

will do nothing:

```
my_set = {1, 2, 3}
my_set.add(2)
print(my_set)
# {1, 2, 3}
```

### Or add members in bulk

To add more than one element at once, use `set.update()`

with a list:

```
my_set = {1, 2, 3}
my_set.update([4, 5])
print(my_set)
# {1, 2, 3, 4, 5}
```

Update, like add, will not add any duplicate values:

```
my_set = {1, 2, 3}
my_set.update([2, 3, 4])
print(my_set)
# {1, 2, 3, 4}
```

## Remove members from a set

There are several options here:

`set.discard(n)`

– removes`n`

from the set, does nothing if`n`

isn’t present. Returns`None`

.`set.remove(n)`

– removes`n`

from the set, raises a`KeyError`

if`n`

isn’t present. Returns`None`

.`set.pop()`

– removes a random element of the set. Raises a`KeyError`

if the set is already empty. Returns the element which was removed.`set.clear()`

– empties the entire set. Returns`None`

```
my_set = {1, 2, 3, 4, 5}
my_set.discard(3) # {1, 2, 4, 5}
my_set.discard(3) # {1, 2, 4, 5}
my_set.remove(2) # {1, 4, 5}
my_set.remove(2) # KeyError: 2
val = my_set.pop()
print(val, my_set)
# 1, {4, 5}
my_set.clear() # {}
```

## Determine if a list has duplicate values

This comes in handy often when doing quick investigation work:

```
my_list = [1, 2, 3, 4, 2, 3, 6]
# Set members are always distinct
# This will automatically dedupe the list
my_set = set(my_list)
len(my_list) # 7
len(my_set) # 5
```

## Determine the difference between sets

There are two … different ways to do this: `difference`

and `symmetric_difference`

.

### Using set.difference()

Calling `a.difference(b)`

will give you a new set containing the elements that are in `a`

but not in `b`

. Order matters here, so `a.difference(b)`

will give different results from `b.difference(a)`

:

```
a = {1, 2, 3}
b = {2, 3, 4}
unique_to_a = a.difference(b)
# {1}
# To get values unique to b, switch the order
unique_to_b = b.difference(a)
# {4}
```

Python also gives us a shorthand for set.difference, the `-`

sign:

```
a = {1, 2, 3}
b = {2, 3, 4}
unique_to_a = a - b
# {1}
```

### Using set.symmetric_difference()

Symmetric difference between sets is defined as all elements in *either* set which are not in both sets. Using the same `a`

and `b`

:

```
a = {1, 2, 3}
b = {2, 3, 4}
# order doesn't matter for symmetric_difference
a.symmetric_difference(b)
# {1, 4}
# This also has a shorthand operator: ^
a ^ b
# {1, 4}
```

I’m not sure I’d recommend using the `^`

operator here as it’s not very commonly-seen and could confuse readers of your code.

#### Bonus: set.isdisjoint()

This will return `True`

if two sets have **no** common elements:

```
a = {1, 2, 3}
b = {4, 5, 6}
a.isdisjoint(b)
True
```

From the Python docs: Sets are disjoint if and only if their intersection is the empty set.

#### Bonus: compare dictionary keys

This has come in handy for me when investigating large dicts. Since a Python `dict`

is technically an iterable, it can be passed into a `set()`

, which is a quick way to see if two objects have the same shape:

```
person = {"name": "justin"}
not_a_person = {"name": "Toyota", "model_year": 2007}
# It seems obvious with these small dicts
# but when there are dozens or hundreds of keys
# this comes in handy
set_one = set(person) # {"name"}
set_two = set(not_a_person) # {"name", "model_year"}
set_one == set_two # False
set_one.symmetric_difference(set_two) # {"model_year"}
```

Note that above, only the dict keys are passed into the set. That’s due to the iterable nature of Python dicts—only the keys are iterated over. To get the values also, you need dict.items().

## Finding set intersections

Use the very appropriately-named `intersection()`

to get a new set containing the values common to both sets:

```
a = {1, 2, 3}
b = {2, 3, 4}
a.intersection(b)
# {2, 3}
# Intersection also has a shorthand operator: &
a & b
# {2, 3}
```

## Supersets and subsets

Use `set.issuperset()`

or `set.issubset()`

^{1}:

```
a = {1, 2, 3}
b = {1, 2}
a.issuperset(b) # True
b.issubset(a) # True
# Order matters
a.issubset(b) # False
b.issuperset(a) # False
```

## Combine two (or more) sets

You can use the `union`

command to combine sets:

```
a = {1, 2, 3}
b = {3, 4, 5}
a.union(b)
# {1, 2, 3, 4, 5}
# This has a shorthand also: |
a | b
# {1, 2, 3, 4, 5}
```

## Frozenset – Immutable sets

The `frozenset`

class is a `set`

which is immutable after it’s created. Once initialized, nothing can ever be added to or removed from a frozen set:

```
a = frozenset([1, 2, 3])
a.add(2)
# AttributeError: 'frozenset' object has no attribute 'add'
a.clear()
AttributeError: 'frozenset' object has no attribute 'clear'
```

This immutability allows frozen sets to be *hashable*, meaning they can be used as members of other sets or as keys in a dictionary.

## More

Believe it or not, there are more `set`

methods, and more shorthand operators which I didn’t cover here. To learn more, check out the official Python docs.

**Notes**

- I’m not sure why Python broke with its usual snake_case for
`issuperset`

,`issubset`

, and`isdisjoint`

—it makes them harder to read / write. ↩︎