Installing XMLLint to Parse and Format XML Data

Format XML File

How often you receive XML file that came straight out of everyone's favorite Windows Operating System all in one line without any indentation and incorrect line ending. Don't worry! xmllint is at your rescue.

Installing on CentosOS, Fedora

sudo yum -y install libxml2

Installing on openSUSE

sudo apt-get install libxml2-utils

Installing on openSUSE

zypper install libxml2

Formatting badly formed XML File

xmllint -format badlyFormattedXMLData.xml > wellFormattedXMLData.xml

Output of the above command would be:

badlyFormattedXMLData.xml

<?xml version="1.0" encoding="UTF-8"?><note><to>myself</to></note>

wellFormattedXMLData.xml

<?xml version="1.0" encoding="UTF-8"?>
<note>
  <to>myself</to>
</note>

What else, `xmllint` can do?

xmllint does much more then formatting. Some of the useful functions are:parsing, verifying, dropping empty nodes and much more.

For Example:

Usage : xmllint [options] XMLfiles ...
        Parse the XML files and output the result of the parsing
        --version : display the version of the XML library used
        --debug : dump a debug tree of the in-memory document
        --shell : run a navigating shell
        --debugent : debug the entities defined in the document
        --copy : used to test the internal copy implementation
        --recover : output what was parsable on broken XML documents
        --huge : remove any internal arbitrary parser limits
        --noent : substitute entity references by their value
        --noenc : ignore any encoding specified inside the document
        --noout : don\'t output the result tree
        --path 'paths': provide a set of paths for resources
        --load-trace : print trace of all external entites loaded
        --nonet : refuse to fetch DTDs or entities over network
        --nocompact : do not generate compact text nodes
        --htmlout : output results as HTML
        --nowrap : do not put HTML doc wrapper
        --valid : validate the document in addition to std well-formed check
        --postvalid : do a posteriori validation, i.e after parsing
        --dtdvalid URL : do a posteriori validation against a given DTD
        --dtdvalidfpi FPI : same but name the DTD with a Public Identifier
        --timing : print some timings
        --output file or -o file: save to a given file
        --repeat : repeat 100 times, for timing or profiling
        --insert : ad-hoc test for valid insertions
        --compress : turn on gzip compression of output
        --html : use the HTML parser
        --xmlout : force to use the XML serializer when using --html
        --nodefdtd : do not default HTML doctype
        --push : use the push mode of the parser
        --pushsmall : use the push mode of the parser using tiny increments
        --memory : parse from memory
        --maxmem nbbytes : limits memory allocation to nbbytes bytes
        --nowarning : do not emit warnings from parser/validator
        --noblanks : drop (ignorable?) blanks spaces
        --nocdata : replace cdata section with text nodes
        --format : reformat/reindent the input
        --encode encoding : output in the given encoding
        --dropdtd : remove the DOCTYPE of the input docs
        --pretty STYLE : pretty-print in a particular style
                         0 Do not pretty print
                         1 Format the XML content, as --format
                         2 Add whitespace inside tags, preserving content
        --c14n : save in W3C canonical format v1.0 (with comments)
        --c14n11 : save in W3C canonical format v1.1 (with comments)
        --exc-c14n : save in W3C exclusive canonical format (with comments)
        --nsclean : remove redundant namespace declarations
        --testIO : test user I/O support
        --catalogs : use SGML catalogs from $SGML_CATALOG_FILES
                     otherwise XML Catalogs starting from 
                 file:///etc/xml/catalog are activated by default
        --nocatalogs: deactivate all catalogs
        --auto : generate a small doc on the fly
        --xinclude : do XInclude processing
        --noxincludenode : same but do not generate XInclude nodes
        --nofixup-base-uris : do not fixup xml:base uris
        --loaddtd : fetch external DTD
        --dtdattr : loaddtd + populate the tree with inherited attributes 
        --stream : use the streaming interface to process very large files
        --walker : create a reader and walk though the resulting doc
        --pattern pattern_value : test the pattern support
        --chkregister : verify the node registration code
        --relaxng schema : do RelaxNG validation against the schema
        --schema schema : do validation against the WXS schema
        --schematron schema : do validation against a schematron
        --sax1: use the old SAX1 interfaces for processing
        --sax: do not build a tree but work just at the SAX level
        --oldxml10: use XML-1.0 parsing rules before the 5th edition
        --xpath expr: evaluate the XPath expression, imply --noout

Hope it helps!

Installing XMLLint to Parse and Format XML Data

Installing on CentosOS, Fedora

Installing on openSUSE

Installing on openSUSE

Formatting badly formed XML File

Output of the above command would be:

What else, xmllint can do?

What else, `xmllint` can do?