How to Parse XML in Python?

Updated: Jul 27, 2023 Viewed: 1420 times

Author: ReqBin

What is Python?

Python is a high-level interpreted object-oriented programming language. Python includes high-level data structures, dynamic typing, dynamic binding, and an extensive standard library. Python supports various programming paradigms—procedural, object-oriented, and functional, which makes Python a versatile tool for software development. Python is widely used for parsing data in various formats, including XML and data analysis. With its ubiquity and ability to run on almost any system architecture, Python is a general-purpose language used in many client and server applications.

What is XML?

XML (Extensible Markup Language) is an extensively used, extensible markup language that provides structure to a variety of information types, including general data, documents, system configurations, and more. XML supports user-defined tags to establish the structure and types of data, resulting in a format that is readable by both machines and humans. XML is widely used in web services, document stores, and data exchange due to its simplicity and flexibility.

XML Example

<Request>
  <Login>
     login
  </Login>
  <Password>
     password
  </Password>
</Request>

What is XML Parsing in Python?

XML parsing in Python is the process of reading an XML document and providing access to its content and structure in a usable format. Python provides several libraries for parsing XML documents, including xml.etree.ElementTree (also known simply as ElementTree), xml.dom.minidom, and xml.sax. Libraries offer different ways to parse XML files, depending on the size of the document and the specific requirements of the task being solved:

xml.etree.ElementTree is a lightweight and efficient library for parsing and creating XML data. Provides methods for parsing XML files into tree structures, making accessing different parts of an XML document easier;
xml.dom.minidom implements the Document Object Model (DOM) interface for Python. This interface allows Python programs to create and parse XML files as a tree, allowing random access to any document part;
xml.sax is another Python XML parsing library that offers a different approach. It is an event-driven parser that reads XML documents sequentially and fires events when it encounters start tags, end tags, text data, etc.

Python Parse XML Examples

The following are examples of XML parsing in Python:

Parsing XML using xml.etree.ElementTree

The xml.etree.ElementTree library provides a simple and efficient way parse XML documents. To get started, import the ElementTree module using the import statement. You can parse an XML string using the fromstring() method. The fromstring() method creates an Element object directly from an XML string. The following is an example of parsing an XML string using the xml.etree.ElementTree library:

Python XML Parser Example

import xml.etree.ElementTree as et

xml_data = '''
<Request>
  <Login>Leo</Login>
  <Password>pass123</Password>
</Request>
'''

root = et.fromstring(xml_data)

password = root.find('Password').text

print(f"Password: {password}")

Parsing XML using xml.dom.minidom

Python also provides another built-in library xml.dom.minidom which offers an alternative way to parse XML documents. The following is an example of parsing an XML string using the xml.dom.minidom library:

Python XML Parser with xml.dom.minidom Example

import xml.dom.minidom as minidom

xml_data = '''
<Order>
  <Customer>Jack</Customer>
  <Rating>6.0</Rating>
</Order>
'''

dom = minidom.parseString(xml_data)

customer = dom.getElementsByTagName('Customer')[0].childNodes[0].data
rating = float(dom.getElementsByTagName('Rating')[0].childNodes[0].data)

print(f'Customer: {customer}')
print(f'Rating: {rating}')

Parsing large XML files

The xml.sax library is particularly useful when dealing with large XML files that can't be loaded into memory. This library provides a way to parse the large files incrementally. The following is an example of parsing an XML string using the xml.sax library:

Python Parse Large XML Files

import xml.sax

class OrderHandler(xml.sax.ContentHandler):
    def startElement(self, name, attrs):
        self.currentElement = name

    def endElement(self, name):
        if self.currentElement == 'Customer':
            print(f'Customer: {self.currentData}')
        elif self.currentElement == 'Rating':
            print(f'Rating: {self.currentData}')
        self.currentElement = ''

    def characters(self, content):
        self.currentData = content

xml_data = '''
<Order>
  <Customer>Jack</Customer>
  <Rating>6.0</Rating>
</Order>
'''

parser = xml.sax.make_parser()
parser.setContentHandler(OrderHandler())
parser.feed(xml_data)

Parsing attributes in XML with ElementTree

The following is an example of processing attributes in XML using the ElementTree. The xml.etree.ElementTree module is used to parse XML data, and the find() method is used to find specific elements in XML.

Python XML Parser with xml.dom.minidom Example

import xml.etree.ElementTree as et

xml_data = '''
<Order>
  <Customer>Jack</Customer>
  <Rating>6.0</Rating>
</Order>
'''

root = et.fromstring(xml_data)

customer = root.find('Customer').text
rating = float(root.find('Rating').text)

print(f'Customer: {customer}, Rating: {rating}')

Parsing XML in Python

What is Python?

What is XML?

What is XML Parsing in Python?

Python Parse XML Examples

Parsing XML using xml.etree.ElementTree

Parsing XML using xml.dom.minidom

Parsing large XML files

Parsing attributes in XML with ElementTree

See also

What do you use ReqBin for the most?

What is Python?

What is XML?

What is XML Parsing in Python?

Python Parse XML Examples

Parsing XML using xml.etree.ElementTree

Parsing XML using xml.dom.minidom

Parsing large XML files

Parsing attributes in XML with ElementTree

See also

Python Parse XML Related API examples and articles

What do you use ReqBin for the most?