8.4 GeoNames Data

The GeoNames geographical database, containing more than 10 million geographic names, is available for download free of charge under a creative commons attribution license. All features are categorized into one out of nine feature classes and further sub-categorized into one out of 645 feature codes.

From ISO 19101, “A feature is an abstraction of real world phenomena and a feature can occur as a type or an instance (4.1.11)”; it is a geographic feature if it is associated with a location relative to the Earth.

This text shows how to download data from GeoNames geographical database and import to a MongoDB collection.

8.4.1 – Downloading GeoNames data

The file containing the data for Argentina, AR.zip, was downloaded from GeoNames Free Gazetteer Data page.

By unpacking the AR.zip file, the AR folder was created containing the AR.txt and readme.txt files, the latter with general information and a description of the fields in AR.txt file.

8.4.2 – Creating a collection in MongoDB to import GeoNames data

To import the data from AR.geojson file, the geonamesar collection was created using the commands:

// Create the geonamesar collection specifying the JSON schema validation rules
use reficio;
// Drop the collection, if exists
db.geonamesar.drop();
// Create the collection
db.createCollection("geonamesar", {
   validator : {
      $jsonSchema : {
         "properties" : {
            "type" : {
               bsonType : "string",
               description : "FeatureCollection",

            },
            "features" : {
               bsonType : "array",
               "items" : {
                  bsonType : "object",
                  "properties" : {
                     "type" : {
                        bsonType : "string",
                        description : "Feature",

                     },
                     "properties" : {
                        bsonType : "object",
                        "properties" : {
                           "geonameid" : {
                              bsonType : "string",
                              description : "integer id of record in geonames database",

                           },
                           "name" : {
                              bsonType : "string",
                              description : "name of geographical point (utf8) varchar(200)",

                           },
                           "asciiname" : {
                              bsonType : "string",
                              description : "name of geographical point in plain ascii characters, varchar(200)",

                           },
                           "alternatenames" : {
                              bsonType : "string",
                              description : "alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)",

                           },
                           latitude : {
                              bsonType : "double",
                              description : "latitude in decimal degrees (wgs84)",

                           },
                           longitude : {
                              bsonType : "double",
                              description : "longitude in decimal degrees (wgs84)",

                           },
                           "feature_class" : {
                              bsonType : "string",
                              description : "see http : //www.geonames.org/export/codes.html, char(1)",

                           },
                           "feature_code" : {
                              bsonType : "string",
                              description : "see http : //www.geonames.org/export/codes.html, varchar(10)",

                           },
                           "country_code" : {
                              bsonType : "string",
                              description : "ISO-3166 2-letter country code, 2 characters",

                           },
                           "cc2" : {
                              bsonType : "string",
                              description : "alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters",

                           },
                           "admin1_code" : {
                              bsonType : "string",
                              description : "fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)",

                           },
                           "admin2_code" : {
                              bsonType : "string",
                              description : "code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80)",

                           },
                           "admin3_code" : {
                              bsonType : "string",
                              description : "code for third level administrative division, varchar(20)",

                           },
                           "admin4_code" : {
                              bsonType : "string",
                              description : "code for fourth level administrative division, varchar(20)",

                           },
                           "population" : {
                              bsonType : "string",
                              description : "bigint (8 byte int)",

                           },
                           "elevation" : {
                              bsonType : "string",
                              description : "in meters, integer",

                           },
                           "dem" : {
                              bsonType : "string",
                              description : "digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.",

                           },
                           "timezone" : {
                              bsonType : "string",
                              description : "the iana timezone id (see file timeZone.txt) varchar(40)",

                           },
                           "modification_date" : {
                              bsonType : "string",
                              description : "date of last modification in yyyy-MM-dd format",

                           }
                        }
                     },
                     "geometry" : {
                        bsonType : "object",
                        "properties" : {
                           "type" : {
                              bsonType : "string",
                              description : "GeoJSON object type Point",

                           },
                           "coordinates" : {
                              bsonType : "array",
                              "items" : {
                                 bsonType : "number",
                                 description : "Point Longitude, Latitude",

                              }
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
})

8.4.3 – Conversion of GeoNames data to GeoJSON format

The file AR.txt downloaded from GeoNames, after being renamed AR.csv and inserted one line with the fields names at the beginning, was converted to the GeoJSON format using the OGR utility program ogr2ogr from GDAL, which converts simple features data between file formats, as shown bellow:

mv AR.txt AR.csv
ogr2ogr -f GeoJSON AR.geojson AR.csv -oo X_POSSIBLE_NAMES=longitude -oo Y_POSSIBLE_NAMES=latitude -oo KEEP_GEOM_COLUMNS=NO

8.4.4 – Importing data

The file AR.geojson produced by ogr2ogr, has a size of 25540263 bytes, bigger than the 16MB supported by mongoimport. So, when trying to import the file into MongoDB using the command

mongoimport \
    --stopOnError \
    --db reficio \
    --collection geonamesar \
    --file AR.geojson

the following output was produced:

2018-05-19T11:21:30.537-0300    connected to: localhost
2018-05-19T11:21:32.890-0300    num failures: 1
2018-05-19T11:21:32.891-0300    Failed: BSONObj size: 24787991 (0x17A3C17) is invalid. Size must be between 0 and 16793600(16MB) First element: insert: "geonamesar"
2018-05-19T11:21:32.891-0300    imported 0 documents

To circumvent this error, the Ruby script split_geojson.rb was used to split AR.geojson into separate files, one for each feature, in folder AR, using the command:

ruby split_geojson.rb AR.geojson

To import the 49272 files, one for each line in the original AR.txt file, created in the AR folder by split_geojson.rb, the mongoimport.sh script was written as shown bellow

#!/bin/bash
for filename in AR/*;
do
mongoimport \
    --stopOnError \
    --db reficio \
    --collection geonamesar \
    --file $filename
done

and executed.

8.4.5 – Show the number of documents in collection geonamesar and look for the Teatro Colón

> use reficio;
switched to db reficio
> db.geonamesar.count();
49272
> db.geonamesar.find({ $and: [
...     {"features.properties.name" : RegExp("Teatro Colón")},
...     {"features.properties.feature_code" : "OPRA"}
...     ]}
... ).pretty();
{
        "_id" : ObjectId("5b003d812d7374494d1ee276"),
        "type" : "FeatureCollection",
        "features" : [
                {
                        "type" : "Feature",
                        "properties" : {
                                "geonameid" : "7729821",
                                "name" : "Teatro Colón",
                                "asciiname" : "Teatro Colon",
                                "alternatenames" : "Colon Theatre,Kolumbus-Theater,Teatro Colon,Teatro Colón,Theatre Colon,Théâtre Colón",
                                "feature_class" : "S",
                                "feature_code" : "OPRA",
                                "country_code" : "AR",
                                "cc2" : "",
                                "admin1_code" : "07",
                                "admin2_code" : "02001",
                                "admin3_code" : "",
                                "admin4_code" : "",
                                "population" : "0",
                                "elevation" : "",
                                "dem" : "30",
                                "timezone" : "America/Argentina/Buenos_Aires",
                                "modification_date" : "2017-05-08"
                        },
                        "geometry" : {
                                "type" : "Point",
                                "coordinates" : [
                                        -58.38308,
                                        -34.60108
                                ]
                        }
                }
        ]
}
>

8.4.6 – Hotels in Buenos Aires in the neighborhood of Recoleta

This example combines two collections to display the hotels in Buenos Aires in the neighborhood of Recoleta

> // Database Reficio
> use reficio
switched to db reficio
> // Geometry of the neighborhood of Recoleta from the barrios_porteños collection
> var recoleta = db.barrios_porteños.find( { 'properties.BARRIO' : 'RECOLETA' } ).toArray();
> if ( recoleta.length > 0 ) {
...    var geometry = recoleta[0].geometry.coordinates;
... }
> // Hotels in Buenos Aires in the neighborhood of Recoleta
> db.geonamesar.find(
...   { $and:
...     [
...       {
...          'features.geometry.coordinates': {
...            $geoWithin: {
...              $geometry: {
...                type : "Polygon" ,
...                coordinates: geometry
...              }
...            }
...          }
...       },
...       { 'features.properties.feature_code' : "HTL" }
...     ]
...   } ,
...   { 'features.properties.name' : 1,
...      _id : 0
...   }
... ).sort( { 'features.properties.name' : 1 });
{ "features" : [ { "properties" : { "name" : "1551 palermo boutique hotel" } } ] }
{ "features" : [ { "properties" : { "name" : "Algodon Mansion" } } ] }
{ "features" : [ { "properties" : { "name" : "Alta Piazza" } } ] }
{ "features" : [ { "properties" : { "name" : "Apartments Rent In Buenos Aires" } } ] }
{ "features" : [ { "properties" : { "name" : "Arenales Apart Hotel" } } ] }
{ "features" : [ { "properties" : { "name" : "Arenales Apartments And Suites" } } ] }
{ "features" : [ { "properties" : { "name" : "Art Suites & Gallery" } } ] }
{ "features" : [ { "properties" : { "name" : "Art Suites And Gallery" } } ] }
{ "features" : [ { "properties" : { "name" : "Babel Recoleta" } } ] }
{ "features" : [ { "properties" : { "name" : "Beruti Flats" } } ] }
{ "features" : [ { "properties" : { "name" : "Blue Tree Hotels Recoleta Ker" } } ] }
{ "features" : [ { "properties" : { "name" : "Buenos Aires Grand Hotel Recol" } } ] }
{ "features" : [ { "properties" : { "name" : "Buenos Aires Wilton Hotel" } } ] }
{ "features" : [ { "properties" : { "name" : "Caesar Park Buenos Aires" } } ] }
{ "features" : [ { "properties" : { "name" : "Callao Plaza Suites Apartments" } } ] }
{ "features" : [ { "properties" : { "name" : "Casasur Art" } } ] }
{ "features" : [ { "properties" : { "name" : "Club Frances" } } ] }
{ "features" : [ { "properties" : { "name" : "Concord Callao Suites" } } ] }
{ "features" : [ { "properties" : { "name" : "Cyan Hotel - Ex Dazzler Suites Recoleta" } } ] }
{ "features" : [ { "properties" : { "name" : "Dazzler Laprida Hotel" } } ] }
Type "it" for more
>

8.4.7 Seeing the hotels on the map

To see the hotels on the map was created the hotels_recoleta.py script in Python that generates the GPX (GPS eXchange Format) file hotels_recoleta.gpx. The map can be seen below in OpenStreetMap.

8. External data