In this post, we will go over the basics of strings in Python and explore the various built-in functionalities available to us. Each section comes with a brief description and some sample code.
Table of Contents
Print New Line
Printing output to the console is definitely quite easy, we simply need to use the print()
function. Sometimes, we might want to print the output on multiple lines. We can either use the \n
special escape character or the ''' '''
multi-line strings method.
# New line using \n
print('This is the first line.\nThis is the second line')
# New line using '''
print('''This is the first line.
This is the second line.''')
Escape Special Characters
There exists a list of special characters that need to be escaped when included in strings. For instance, if we use single quotes to denote our string, we need to escape any single quote used inside. In other scenarios, we might also need to watch out for special instances such as \n
for new lines, as previously mentioned. A neat alternative to avoid having to escape special characters is to use raw strings where characters including \
are treated as a literal characters.
# Escape single quote, prints You're amazing
print('You\'re amazing')
# Use double quotes on the outside, no need to escape, prints You're amazing
print("You're amazing")
# Escape the \, where we want to print the actual \n characters, prints brooklyn\nets
print('brooklyn\\nets')
# Using raw string to avoid tab (\t), prints oklahoma city\thunder
print(r'oklahoma city\thunder')
String Length
This built-in function is pretty straight-forward when using it with strings: it returns the length, or in other words the number of characters (alpha-numerical, symbols, spaces, any character) in the string.
# Length of string, prints 35
print(len('This string contains 35 characters!'))
# Length of empty string, prints 0
print(len(''))
# Length of special escape characters \n or \t, prints 1
print(len('\n'))
String Slicing
Just like lists in python, strings can also be sliced using the square brackets []
notation. Taking the string 'New York Knicks'
as an example, each character in the string can be accessed through a corresponding index. The first character N
can be retrieved at index 0 and the last character s
at index 14. Note that we can also access each character at negative indices, as shown in the following table:
We can specify a range denoted as [start, end]
, where the start
index is inclusive and the end
index is exclusive. In other words, [3:8]
will access the characters from index 3 to 7. It is also possible to use a combination of both positive and negative indices at the same time.
text = 'New York Knicks'
# Get character at index 0, prints N
print(text[0])
# Get character at index -1 (last character), prints s
print(text[-1])
# Get characters from first to 8th index (exclusive), prints New York
print(text[:8])
# Get characters from 9th to last index, prints Knicks
print(text[9:])
# Get characters from 4th to 7th index, prints York
print(text[4:8])
# Get characters from -11th to 7th index, also prints York
print(text[-11:8])
Substring Counts
It is possible to count the number of occurrences of a given sub-string within a string with the count()
function. One thing to remember is that it's case sensitive, meaning that uppercase L
and lowercase l
are not equivalent and will not be considered as being the same character. Additionally, we can also specify a range in which we want to conduct the counting.
Parameters
sub
: substring pattern to locate in stringstart
: start index for slicing [optional]end
: end index for slicing [optional]
text = 'Los Angeles Lakers'
# Count number of occurrences of L, prints 2
print(text.count('L'))
# Count number of occurrences of Los, prints 1
print(text.count('Los'))
# Count number of occurrences of Z, prints 0
print(text.count('Z'))
# Count number of occurrences of e between indices 4 and 11 (Angeles), prints 2
print(text.count('e', 4, 11))
Find Substring Index
To find the first occurrence (lowest index) of a specific substring in a given string, we can use the functions find()
/ index()
. To find the last occurrence (highest index), we can use rfind()
/ rindex()
. The main difference between these two functions is that when a substring is not found, find()
returns -1
while index()
raises an exception.
Parameters
sub
: substring pattern to locate in stringstart
: start index for slicing [optional]end
: end index for slicing [optional]
text = 'The Toronto Raptors won the Championship last year'
# Find lowest index of character o, prints 5
print(text.find('o'))
# Find highest index of character o, prints 34
print(text.rfind('o'))
# Find lowest index of character o between index 12 and 23 (Raptors won), prints 16
print(text.find('o', 13, 23))
# Find highest index of character o between index 12 and 23 (Raptors won), prints 21
print(text.rfind('o', 13, 23))
# Find lowest index of first character of substring Champion, returns 28
print(text.find('Champion'))
# Find character not found using find(), returns -1
print(text.find('$'))
# Find character not found using index(), raises exception
print(text.index('$'))
Replace Specific Substring
String variables also have an inbuilt replace()
function that allows us to replace a specific pattern in a given string with something else. We can also specify the maximum number of replacements we want to make, starting from the smallest index.
Parameters
old
: substring pattern to be replacednew
: new substring replacing oldcount
: number of replacements starting from lowest index [optional]
text = 'Jordan won many rings. Jordan is the GOAT. Jordan played for the Bulls.'
# Replace all occurrences of Jordan with Kobe
print(text.replace('Jordan', 'Kobe'))
# Since Kobe never played for the Bulls, replace on first 2 occurrences of Jordan with Kobe
print(text.replace('Jordan', 'Kobe', 2))
Strings Formatting
There are several ways to concatenate multiple strings into a single one. If we want to join multiple string literals, it suffices to write them one next to another, with or without spaces. Python parses the string literals and joins them together. However, if we want to concatenate string variables, this will not work. Instead, we can use the +
operator between two variables or strings to concatenate them. Also, if we want to repeat and concatenate a specific string multiple times, we can use the *
operator.
# Concatenante string literals, prints LeBron James is King
print('LeBron' 'James' 'is' 'King')
# Store string in variables
first = 'Kevin'; last = ' Durant'; team = 'Brooklyn Nets';
# Concatenate using + operator, prints Kevin Durant plays on the Brooklyn Nets
print(first + last + ' plays on the ' + team)
# Concatenate repeated string using * operator, prints Go Durant! Go Durant! Go Durant!
print(('Go' + last + '! ') * 3)
We can also concatenate strings by using the join()
function, where we can specify a separator between them. Given an iterable (ie. list, tuple) as input, the function returns all the elements combined into one string.
Parameters
iterable
: list, tuple, string, dictionary, etc.
# Concatenate strings using the join() function, prints Kevin Durant signed with the Brooklyn Nets
print(''.join([first, last, ' signed with the ', team]))
# Concatenate strings with dash separator, prints boston-celtics-2017-2018
print('-'.join(['boston', 'celtics', '2007', '2008']))
The most common way of concatenating strings is to use the format()
function, which takes as input variables to be included in the string. Using curly braces {}
as replacement fields, we can determine the position of each input argument. The first argument goes to the first pair of braces, the second goes to the next and so on. We can also specify which argument goes in place of each pair of braces by adding the index or name of the keyword argument.
Parameters
*args, **kwargs
: any number of input parameters.
# Store string in variables
first = 'Steph'; last = 'Curry'; team = 'Golden State Warriors';
# Concatenate string variables using format()
# Prints Steph Curry from downtown, BANG! Golden State Warriors with the lead!
print('{} {} from downtown, BANG! {} with the lead!'.format(first, last, team))
# Specify index of replacement fields, prints one, two, three
print('{1}, {2}, {0}'.format(*('three', 'one', 'two')))
# Do arithmetic in the arguments, prints Dirk Nowitzki wore number 41
print('Dirk Nowitzki wore number {}'.format(36 + 5))
# Using keyword
print('{lastname} has a beard'.format(lastname = 'Harden'))
There is also a new way of formatting strings, and that is with the f-Strings
, which all beging with an f
in front of the string literal. This new syntax has been available since Python version 3.6, and it allows us to embed replacement variables within the string itself. Touted as an improved and more elegant method compared to format()
, this new syntax can make code cleaner and easier to understand. We can still call functions directly, use objects created from classes and so on.
Although this new way of formatting is fancy and can make code cleaner, we need to remind ourselves that it is an addition, and not a replacement. We can still use format()
, and in some instances it makes more sense to do so. For instance, in the second example where we want to print out One, Two, Three
, the old technique is more suitable. Since the data is in a tuple, we can unpack it within the function and assign the replacement positions in the string. With f-Strings
, we cannot leverage the unpacking functionality, and would have to access each item of the tuple individually.
# Concatenate string variables using f-String
# Prints Steph Curry from downtown, BANG! Golden State Warriors with the lead!
print(f'{first} {last} from downtown, BANG! {team} with the lead!')
# Need to access each item in tuple
print(f"{('three', 'one', 'two')[1], ('three', 'one', 'two')[2], ('three', 'one', 'two')[0]}")
# Do arithmetic inside string, prints Dirk Nowitzki wore number 41
print(f'Dirk Nowitzki wore number {36 + 5}')