STX to process OSM XML data and make a CSV file of all the cities.

In order to produce a CSV of all the cities in the world, I used STX, since the files are so big and I wanted to do it in a memory effeciant manner. Despite what I said about how to get the cities from OSM, I downloaded all the places=* nodes, so I needed to only filter out the place=city ones.

Here's the (commented) STX file that I used (Download cities-to-csv.stx directly)

<?xml version='1.0' encoding='UTF-8'?>
<stx:transform version="1.0"
               xmlns:stx="http://stx.sourceforge.net/2002/ns"
               pass-through="none"
               output-method="text"
>

<!-- We need to keep track of only these variables -->
<stx:variable name="lat"/>
<stx:variable name="lon"/>
<stx:variable name="name"/>
<stx:variable name="is_city" />

<stx:template match="/osm/node">

  <!-- When we first see a node, store the latitude and longitude (which are the lat and lon attributes) -->
  <stx:assign name="lat" select="@lat" />
  <stx:assign name="lon" select="@lon" />

  <!-- Since this is the start of a new node, 'reset' the name to blank and set is_city to false, ergo in the absense of a 'place=city' tag, we will (correctly) interprete this as not a city -->
  <stx:assign name="name" select="''" />
  <stx:assign name="is_city" select="false()" />

  <!-- Now process the child nodes, which includes the <tag> nodes, this will tell us our name and if we are a city -->
  <!-- matching the tag nodes is done below in the <stx:template> rules, so the processor seems to jump from here to down below and then back here -->
  <stx:process-children/>

  <!-- If the is_city variable has been set, then output a CSV formatted line for this node -->
  <stx:if test="$is_city">
  <stx:value-of select="$lat"/>
  <stx:text>,</stx:text>
  <stx:value-of select="$lon"/>
  <stx:text>,</stx:text>
  <stx:value-of select="$name"/>
  <stx:text>
</stx:text>
  </stx:if>

</stx:template>

<stx:template match="/osm/node/tag[@k='name']">
  <!-- When we see a name tag, store that value -->
  <stx:assign name="name" select="@v" />
</stx:template>

<stx:template match="/osm/node/tag[@k='place']">
  <!-- We can't have 2 [@...] in one match in STX, so we need to work around with this if clause -->
  <stx:if test="@v = 'city'">
      <!-- Since we have seen the place=city tag, set the variable to true -->
      <stx:assign name="is_city" select="true()" />
  </stx:if>
</stx:template>

</stx:transform>

I ran this using the following command (see how to install/run joost on ubuntu).

java -jar /path/to/joost.jar places.osm.xml cities-to-csv.stx > cities.csv

You can download the resultant CSV of all the cities in the world.

Comments !

blogroll