To replace a string or substring in Python you have two main options:
- Built in
string.replace()
- Python’s standard library
re.sub()
Python’s String Replace Method
The string.replace
method works roughly how you would expect it to:
my_string = "abc123"
new_string = my_string.replace("b","X")
print(new_string)
# "aXc123"
By default, string.replace will replace all occurrences of the string you want to replace. However, it accepts an optional integer third argument specifying how many times to replace:
my_string = "aaabbbccc"
my_string.replace("a", "X", 2)
# "XXabbbccc"
One important note: string.replace
does not modify the existing string, it returns a new copy with the replacement performed.
Since string.replace returns a string, you can also chain replace operations:
# Chain as many replacements as you like
my_string = "aaabbbccc"
my_string.replace("a", "X", 2).replace("b", "").replace("c", "Z")
# "XXaZZZ"
However, with more complex replacements you might want to use regex instead.
Python Regex re.sub
Unless I’m doing a simple character swap or character removal (replace with empty string), I tend to use regex. If you can create the regex pattern, re.sub
can probably use it to replace content for you.
Regex is a deep topic, and I actually wrote up a regex cheatsheet post, but here’s a simple re.sub example:
import re
my_string = "abc123"
# remove all digits
new_string = re.sub("\d", "", my_string)
print(new_string)
# "abc"
Like string.replace above, using re.sub
does not modify the original string, it returns a new copy.
Here’s an example from my day job, comparing two XML files:
# Similar to an actual task I had to do at work recently
import re
# These two files were supposed to be the same, but the
# spacing and indentation made them hard to compare
with open("file_one.xml", "r") as file_one:
xml_one = file_one.read()
# Don't forget your context managers!
with open("file_two.xml", "r") as file_two:
xml_two = file_two.read()
tabs_or_newlines= "[\t\n]"
# substitute with empty string "" to remove tabs and newlines
xml_one_stripped = re.sub(tabs_or_newlines, "", xml_one)
xml_two_stripped = re.sub(tabs_or_newlines, "", xml_two)
# Without the weird tabs and line breaks, they should be the same
if xml_one_stripped == xml_two_stripped:
print("they're the same")
else:
print("not the same")
For more information, check out the helpful doc links below.
Helpful Links
- String.replace – official Python docs
- Python re.sub – official Python docs
- Python context managers – Me!