Easy XML to CSV Conversion
Countless tools, libraries and algorithms in use today assume inputs are already structured as rectangular or tabular data sets defined in terms of rows and columns. While the nomenclature can vary depending on the domain it all essentially refers to the same paradigm.
For instance if you are using machine learning libraries, columns are often referred to as 'features' while rows are expected to represent 'observations'.
Whatever the domain, the need to be able to take a messy, nested data structure such as XML and convert it into an easily digestible data set for the existing tools set remains.
Current Solutions
A quick internet search will turn up of a number solutions that you soon realise are difficult to leverage. Broadly speaking, anyone attempting to flatten out XML to CSV comes across one or more of these challenges:
-
The solution is too simplistic. For instance, it lacks the ability to recursively expand out nested lists or customise attribute names.
-
The solution is part of a larger 'framework' that cannot be used standalone. Oracle XMLTable for instance, requires you to install the Oracle Database first.
-
The solution is purpose built for a specific domain / XSD definition. A good example is Flexter, a tool which has purpose built implementations that are highly optimised for specific format conversions, e.g. FIXML to MS SQL Server.
-
The solution is too expensive. And the cost cannot be justified given the size of your project.
Introducing the XML-Flattener
The XML Flattener (https://github.com/DevWorxCo/xml-flattener) is an open-source standalone tool with minimal dependencies that can quickly convert entire directories containing large numbers of XML files to a form that can be consumed by tools such as R Studio, Python Pandas or Excel.
To run the "Hello World" example, you simply need to have Java 8 installed and run the following commands (see README.md file on GitHub)
Linux
git clone https://github.com/DevWorxCo/xml-flattener.git
cd xml-flattener
wget -P target https://www.devworx.co.uk/assets/jars/xml-flattener-exec.jar
java -jar target/xml-flattener-exec.jar examples/Hello-World/hello-world.yml
Windows
git clone https://github.com/DevWorxCo/xml-flattener.git
cd xml-flattener
curl -o target/xml-flattener-exec.jar --create-dirs https://www.devworx.co.uk/assets/jars/xml-flattener-exec.jar
java -jar target/xml-flattener-exec.jar examples/Hello-World/hello-world.yml
The above commands will produce the examples/Hello-World/output/continents-flattened.csv
file.
Contact
I would be interested to hear your feedback on this and other topics. Feel free to get in touch.