Python String Basics
In Python, strings are immutable sequences of bytes representing 8-bit ASCII (in Python 2) or 16-bit Unicode (in Python 3) characters. Immutable strings mean that once created, they cannot be changed. All string manipulation methods return a copy of the string and do not modify the existing one.
my_string = 'I am a Python string.' print(my_string) # output: I am a Python string.
String literals can be enclosed in single, double, or triple quotes. You can use single quotes in double and triple quoted strings and vice versa. Long strings spanning multiple lines can be wrapped by adding a backslash "\" at the end of each line or by enclosing them in triple quotes. As with most other programming languages, non-printable characters in Python can be escaped with a backslash "\".
my_string1 = 'Backslash\r\nExample' print(my_string1) # output: Backslash # output: Example my_string2 = """Line 1 Line 2 Line 3""" print(my_string2) # output: Line 1 # output: Line 2 # output: Line 3
Python does not have a dedicated data type for a single character; it is just a string of length 1. String characters can be accessed using the square brackets "". In Python, the index starts at zero.
Python strings are represented in two ways: Regular strings and Unicode strings. String splitting techniques work with both kinds of strings.
Regular strings are represented as an array of 8-bit ASCII bytes. Each byte represents a specific character in the alphabet. Since an 8-bit integer can contain a value between 0 and 255, it can display a maximum of 255 characters and is not suitable for alphabets with more characters (for example, Chinese). Python creates regular strings by default unless otherwise noted.
str = 'I am a regular string'
Unicode strings are an array of 16-bit ASCII bytes. The maximum for a 16-bit number is 65535, meaning Unicode strings can support significantly more characters in the alphabet, such as Chinese characters. To create a Unicode string, add the "u" character in front of the string.
str = u'I am a Unicode string'
Method #1. Splitting a string using the split()
The easiest and most commonly used way to split a string is to use the split () method. By default, Python breaks a line into spaces, tabs, and line breaks. To split a string with another character, or even another string, you need to pass the delimiter as a parameter when calling the split () method.
print('I am Python string.'.split()) # output: ['I', 'am', 'Python', 'string.']
An example of splitting a string by the "-" character using the split() method.
print('python-string-example'.split('-')) # output: ['python', 'string', 'example']
An example of splitting a string using a substring as a delimiter.
print('I am a Python string'.split('Python')) # output: ['I am a ', ' string']
The split () method has a maxsplit parameter. This parameter specifies the maximum number of times a string is split.
print('Python string example'.split(' ', 1)) # output: ['Python', 'string example']
Method #2. Splitting a string using the rsplit()
The rsplit() method is similar to split() and has the same signature as the split method. The only difference is that rsplit() splits the string from right to left if the maxsplit parameter is specified.
print('word1 word2 word3'.split(' ', 1)) # output: ['word1', 'word2 word3'] print('word1 word2 word3'.split(' ', 1)) # output: ['word1 word2', 'word3']
Method #3. Splitting a string using the splitlines()
The Python splitlines() method splits text at line breaks.
print('one\ntwo\nthree'.splitlines()) # output: ['one', 'two', 'three']
When calling the splitlines() method, you can pass keeplinebreaks = True, so line breaks are not removed from the result.
print('one\ntwo\nthree'.splitlines(True)) # output: ['one\n', 'two\n', 'three']
Method #4. Splitting Python strings using Regular Expressions
You can split a Python string using the regex library. To do this, you need to import the re library and call re.split() by passing a regular expression as the first parameter and the string itself to be split as the second parameter.
import re print(re.split('[_^#]', 'one#two_three^four')) # output: ['one', 'two', 'three', 'four']
Method #5. Getting a range of characters from a string
Since strings in Python are an array of bytes, you can specify the range of characters you want to take from a string, just as you do for collections.
print('Python string'[2:14]) # output: thon string
If you do not specify the first number in the range, then the beginning of the line will be taken; if the second, then the end.
print('Python string'[:6]) # output: Python
Python provides an advanced set of methods for splitting strings. You can use the split(), rsplit(), and splitlines() methods to split a text with specific characters and substrings. In some situations, you can use a more advanced method of splitting strings using regular expressions, and in some simple cases, it may be sufficient to take a range of characters from the string.