I am sure there are other XML beautifiers, but I couldn't find one that will work for me. (I am sure in comments someone will post better solutions). Finally following simple python script did the trick. I found it here and corrected a little to take care of </> tags. It worked perfectly on many XML dumps I worked with.
#!/usr/bin/python
import sys
import re
data = open(sys.argv[1],'r').read()
fields = re.split('(<.*?>)',data)
level = 0
for f in fields:
if f.strip() == '': continue
if f[0]=='<' and f[1] != '/':
print ' '*(level*4) + f
level = level + 1
if f[-2:] == '/>':
level = level - 1
elif f[:2]=='</':
level = level - 1
print ' '*(level*4) + f
else:
print ' '*(level*4) + f
It's all about keeping track of depth.
4 comments:
Thanks! This works great.
Thanks for your yousefull simple script.
I extend it, to preserve CDATA-Section which contains HTML-Tags:
#!c:/Pythom26/python26.exe
# XMLBeautifier.py
# Quelle: http://jyro.blogspot.com/2009/08/makeshift-xml-beautifier-in-python.html
# modified by Thomas Haeny (dev@haeny.de), 9.8.2010
import sys
import re
#init's:
preserveCDATA = 1
intendCols = 4
data = open(sys.argv[1],'r').read()
fields = re.split('(<.*?>)',data)
level = 0
cdataFlag=0
for f in fields:
if f.strip() == '': continue
if preserveCDATA :
# rejoin splitted CDATA-Tags which contains HTML-Tags
if f[:8] == '' :
cdataFlag=0
print ' '*(level*intendCols) + cdata
continue
if f[0]=='<' and f[1] != '/' and f[1] != '!' :
print ' '*(level*intendCols) + f
level = level + 1
if f[-2:] == '/>':
level = level - 1
elif f[:2]=='</':
level = level - 1
print ' '*(level*intendCols) + f
else:
print ' '*(level*intendCols) + f
You can decide to publish it or not.
Thanks for the improvements.
Over time, I found out about xmllint. It's nice little utility that can beautify XML.
Very cool - I know there are more elaborate tools out there, but I needed it for just a couple files and your script is perfect.
Post a Comment