Parsing XML in Python

To parse an XML document with Python, you can use the built-in xml.etree.ElementTree module, which provides a lightweight and efficient API for parsing XML data strings. The ElementTree.fromstring() method parses the provided XML document and returns an ElementTree object representing the root element of the XML structure. With this object, you can access tags, attributes, and text elements of the XML. In this Python XML parsing example, we process the XML document using the xml.etree.ElementTree.fromstring() method. You can find more examples of parsing XML in Python below. Click Execute to run the Python Parse XML example online and see the result.
Parsing XML in Python Execute
import xml.etree.ElementTree as et

xml_data = '''
<Order>
    <Customer>Jack</Customer>
    <Rating>6.0</Rating>
</Order>
'''

root = et.fromstring(xml_data)

customer = root.find('Customer').text

print(f"Customer: {customer}")
Updated: Viewed: 1420 times

What is Python?

Python is a high-level interpreted object-oriented programming language. Python includes high-level data structures, dynamic typing, dynamic binding, and an extensive standard library. Python supports various programming paradigms—procedural, object-oriented, and functional, which makes Python a versatile tool for software development. Python is widely used for parsing data in various formats, including XML and data analysis. With its ubiquity and ability to run on almost any system architecture, Python is a general-purpose language used in many client and server applications.

What is XML?

XML (Extensible Markup Language) is an extensively used, extensible markup language that provides structure to a variety of information types, including general data, documents, system configurations, and more. XML supports user-defined tags to establish the structure and types of data, resulting in a format that is readable by both machines and humans. XML is widely used in web services, document stores, and data exchange due to its simplicity and flexibility.

XML Example
<Request>
  <Login>
     login
  </Login>
  <Password>
     password
  </Password>
</Request>

What is XML Parsing in Python?

XML parsing in Python is the process of reading an XML document and providing access to its content and structure in a usable format. Python provides several libraries for parsing XML documents, including xml.etree.ElementTree (also known simply as ElementTree), xml.dom.minidom, and xml.sax. Libraries offer different ways to parse XML files, depending on the size of the document and the specific requirements of the task being solved:

  • xml.etree.ElementTree is a lightweight and efficient library for parsing and creating XML data. Provides methods for parsing XML files into tree structures, making accessing different parts of an XML document easier;
  • xml.dom.minidom implements the Document Object Model (DOM) interface for Python. This interface allows Python programs to create and parse XML files as a tree, allowing random access to any document part;
  • xml.sax is another Python XML parsing library that offers a different approach. It is an event-driven parser that reads XML documents sequentially and fires events when it encounters start tags, end tags, text data, etc.

Python Parse XML Examples

The following are examples of XML parsing in Python:

Parsing XML using xml.etree.ElementTree

The xml.etree.ElementTree library provides a simple and efficient way parse XML documents. To get started, import the ElementTree module using the import statement. You can parse an XML string using the fromstring() method. The fromstring() method creates an Element object directly from an XML string. The following is an example of parsing an XML string using the xml.etree.ElementTree library:

Python XML Parser Example
import xml.etree.ElementTree as et

xml_data = '''
<Request>
  <Login>Leo</Login>
  <Password>pass123</Password>
</Request>
'''

root = et.fromstring(xml_data)

password = root.find('Password').text

print(f"Password: {password}")

Parsing XML using xml.dom.minidom

Python also provides another built-in library xml.dom.minidom which offers an alternative way to parse XML documents. The following is an example of parsing an XML string using the xml.dom.minidom library:

Python XML Parser with xml.dom.minidom Example
import xml.dom.minidom as minidom

xml_data = '''
<Order>
  <Customer>Jack</Customer>
  <Rating>6.0</Rating>
</Order>
'''

dom = minidom.parseString(xml_data)

customer = dom.getElementsByTagName('Customer')[0].childNodes[0].data
rating = float(dom.getElementsByTagName('Rating')[0].childNodes[0].data)

print(f'Customer: {customer}')
print(f'Rating: {rating}')

Parsing large XML files

The xml.sax library is particularly useful when dealing with large XML files that can't be loaded into memory. This library provides a way to parse the large files incrementally. The following is an example of parsing an XML string using the xml.sax library:

Python Parse Large XML Files
import xml.sax

class OrderHandler(xml.sax.ContentHandler):
    def startElement(self, name, attrs):
        self.currentElement = name

    def endElement(self, name):
        if self.currentElement == 'Customer':
            print(f'Customer: {self.currentData}')
        elif self.currentElement == 'Rating':
            print(f'Rating: {self.currentData}')
        self.currentElement = ''

    def characters(self, content):
        self.currentData = content

xml_data = '''
<Order>
  <Customer>Jack</Customer>
  <Rating>6.0</Rating>
</Order>
'''

parser = xml.sax.make_parser()
parser.setContentHandler(OrderHandler())
parser.feed(xml_data)

Parsing attributes in XML with ElementTree

The following is an example of processing attributes in XML using the ElementTree. The xml.etree.ElementTree module is used to parse XML data, and the find() method is used to find specific elements in XML.

Python XML Parser with xml.dom.minidom Example
import xml.etree.ElementTree as et

xml_data = '''
<Order>
  <Customer>Jack</Customer>
  <Rating>6.0</Rating>
</Order>
'''

root = et.fromstring(xml_data)

customer = root.find('Customer').text
rating = float(root.find('Rating').text)

print(f'Customer: {customer}, Rating: {rating}')

See also