Is there a general name for computer data text files organized like this?

Napier · May 26, 2018, 2:40am

I’m using .kml and .gpx files, which are data files for GPS tracks such as a hiker records. They are human readable text, but they are formatted in similar distinctive ways. The structure is nested with these <name> and </name> delimiters containing predictable blocks between them, like an ISO 8160 timestamp and a set of coordinates. Is there a name for, generally, structuring a data file like this, with this form of start-and-stop delimiters?

Here’s a .kml example, the first few lines of a file:

<?xml version=“1.0” encoding=“UTF-8”?>
<kml xmlns=“http://www.opengis.net/kml/2.2” xmlns:gx=“http://www.google.com/kml/ext/2.2”>
<ExtendedData>
<Data name=“GPSModelName”>
<value>Dual XGPS160</value>
</Data>
</ExtendedData>
<Placemark>
<gx:Track>
<altitudeMode>absolute</altitudeMode>
<when>2017-02-04T18:30:31Z</when><gx:coord>-75.809845 39.722271 41.1</gx:coord>
<when>2017-02-04T18:30:32Z</when><gx:coord>-75.809814 39.722229 44.2</gx:coord>
<when>2017-02-04T18:30:33Z</when><gx:coord>-75.809723 39.722229 47.2</gx:coord>

minor7flat5 · May 26, 2018, 2:56am

XML…first line is what says it is so, and the version and encoding.

Were you looking for something more specific to your particular file?

SuperAbe · May 26, 2018, 3:16am

More generally this is an example of a structured data document type - XML is the most common example but hardly the only one. Here is a pretty decent primer on XML structure/function.

Napier · May 26, 2018, 3:18am

Hey, thanks! Just what I needed to know! Not looking for something more specific about this file. I wanted to find out more about pre-existing tools and code snippets for extracting information from the files, and now I have an angle to work. Thanks!

minor7flat5 · May 26, 2018, 3:27am

Glad to help. Now that you know what to search for you will find loads of different libraries and tools for dealing with these files.

friedo · May 26, 2018, 3:50am

XML is a pain in the ass. But it has one advantage: it’s very universal, which means there’s a ton of tooling available for working with it. It also has a disadvantage: it’s very universal, which means there’s a ton of tooling available for working with it.

griffin1977 · May 26, 2018, 4:18am

If you get to choose which format you are working with you might want to consider JSON instead. It has less overhead (in terms of amount wasted characters) and generally the parsers are easier to use.

Napier · May 26, 2018, 12:04pm

I don’t get to choose what format to work with. It’s really one very specific thing I need to fix. I’m recording hiking trails, and for a while I used a Dual XGPS160 GPS receiver and logger. Unfortunately one version of its accompanying phone app records my longitude, which is really around -75.5, as around -283.5. This is just a flat out illegal value. I have about 150 miles of recordings written this way. It looks like I could change the sign and subtract from 360, or some similar formula, and recover the tracks. I’m just trying to figure out how to do that. Knowing this is XML may let me use existing tools to handle most of the logic to break each file down, change this one value, and build a new one back up that is otherwise the same.

While I’m at it I’d like to extract ALL the coordinates, from these bad files as well as all my good ones. I wrote, years ago, code that can write a .dxf file that Autocad can import, based on a series of points. I may create my own map and have control over all the details. And there’s a park ranger eager to work with me to use the data in their own next generation map, so it’d be nice to be able to clean the data and deliver it in a way they can use well.

Thanks, everybody!

Francis_Vaughan · May 26, 2018, 12:13pm

One assumes this should be 284.5 and the silly thing has wrapped you around the globe - and got it wrong.

Code up your stuff in Python and you will have access to enough pre-baked libraries that you could do the entire data massaging task in only a few tens of lines of code.

Chronos · May 26, 2018, 12:17pm

Similarly, I have a back-burner project that would involve processing SVG files, and I was sort of dreading having to figure out how to manage them. Then, for a different project, I was following some directions I found online and ended up opening an SVG in TextEdit, and discovered that they’re just human-readable XML, too. That’s not nearly so intimidating any more.

Napier · May 26, 2018, 1:40pm

Yes, there are a lot of data files out there that are human readable. I like this trend. We have too much important data at work that was stored in some dumb proprietary format that got marooned in a few years. I guess proprietary formats are useful in the short term, like letting a car idle, but it’s no way to leave things stored for the ages.

JWT_Kottekoe · May 26, 2018, 2:02pm

XML is a ubiquitous abomination.

leahcim · May 27, 2018, 2:13am

Sometimes, even if you find a file that is not human-readable, it is secretly a zip file containing a compressed human-readable format.

friedo · May 27, 2018, 2:22am

I once had to reverse engineer a proprietary file format that we got from a vendor who had long since gone out of business. Turned out it was just a zip of (you guessed it) XML files. One the of XML files just contained a list of the other XML files.

rat_avatar · May 28, 2018, 7:37pm

Note, while XML has serious issues (mostly around security) that you can get the exact schema’s you need to decode those files.

http://www.opengis.net/kml/2.2

But you will notice that the other schema is now a 404…and external public schema urls are very risky from a security perspective.

But for common formats there is almost always a good python module.

https://fastkml.readthedocs.io/en/latest/usage_guide.html#read-a-kml-file-string
In general json, xml and other documents like this are structured as directed graphs with strict arboricity or at least something that approaches it.

They are almost always just set of objects (vertices or nodes) in a tree structure.

Topic		Replies	Views
Is there a GPS device that can download data FROM GoogleEarth? Factual Questions	6	1002	March 21, 2008
Entering a list of coordinates into Google Earth Factual Questions	2	1921	April 13, 2007
Data crunching! Excel, Filemaker masters needed! Can I do this? Factual Questions	3	692	August 23, 2007
XML to CSV Factual Questions	17	1240	May 27, 2011
what is xml? Factual Questions	17	1420	September 3, 2004

Is there a general name for computer data text files organized like this?

Related topics