Parsing XML for the text/cdata between a particular start/end tag

Published: Wednesday, Dec 26, 2007 Last modified: Saturday, Mar 23, 2024

Here is a code snippet whereby, it would load up the file object into memory, and get the text between a tag. For example a tag such as:

<school>University of Helsinki</school>

contained in a example file /tmp/myinfo.xml I would now run make use of the snippet like:

file_object = open('/tmp/myinfo.xml')

print parse(file_object, 'school')

file_object.close()

The snippet:

from xml.dom import minidom

def parse(infile, tag):
    output = ''
    xmldoc = minidom.parse(infile)
    grab = xmldoc.getElementsByTagName(tag)
    for data in grab:
            if _debug: print data.toxml()
            childNodes = data.childNodes
            for node in childNodes:
                    output += node.data
    return output + '\n'

Notice I iterated through “grab”, as in XML the earlier piece of xml could have be written like:

<school>

University
            of

Helsinki
                         </school>