Parsing XML for the text/cdata between a particular start/end tag
Published: Wednesday, Dec 26, 2007 Last modified: Thursday, Nov 14, 2024
Here is a code snippet whereby, it would load up the file object into memory, and get the text between a tag. For example a tag such as:
<school>University of Helsinki</school>
contained in a example file /tmp/myinfo.xml I would now run make use of the snippet like:
file_object = open('/tmp/myinfo.xml')
print parse(file_object, 'school')
file_object.close()
The snippet:
from xml.dom import minidom
def parse(infile, tag):
output = ''
xmldoc = minidom.parse(infile)
grab = xmldoc.getElementsByTagName(tag)
for data in grab:
if _debug: print data.toxml()
childNodes = data.childNodes
for node in childNodes:
output += node.data
return output + '\n'
Notice I iterated through “grab”, as in XML the earlier piece of xml could have be written like:
<school>
University
of
Helsinki
</school>