Skip to content

Commit b8fdd75

Browse files
author
mgoddard
committed
Reorganizing into directories; working on README
1 parent 9c3f838 commit b8fdd75

12 files changed

+436
-32
lines changed

Procfile

-1
This file was deleted.

README.md

+35-1
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,47 @@
11
# CockroachDB Geo Tourist
22

3-
## GIS demo: find pubs, restaurants, cafes, etc. using the spatial features of CockroachDB
3+
## Use the spatial features in CockroachDB to find pubs, restaurants, cafes, etc.
44

55
![Screenshot restaurants](./restaurants.jpg)
6+
(App shown running on a laptop)
7+
8+
This is a simple Python Flask and Javascript app which illustrates some of the
9+
new spatial capabilities in CockroachDB 20.2. The scenario is this: in the web
10+
app, an icon represents the user, and this user is situated at a location
11+
randomly chosen from a set of destinations, each time the page is refreshed.
12+
Then, a REST call is made from the Javascript front end, including the type of
13+
_amenity_ to search for as well as the user's location. Witin the Python Flask
14+
app, those values are featured in a SQL query against a CockroachDB instance
15+
loaded with spatial data. This query uses the following spatial data types,
16+
operators, and indexes to find and return a set of the nearest amenities,
17+
sorted by distance:
18+
19+
1. `GEOGRAPHY`: the data type to represent each of the `POINT` data elements associated with the amenity
20+
1. `ST_Distance`: used to calculate the distance from the user to each of these locations
21+
1. `ST_Y` and `ST_X`: are used to retrieve the longitude and latitude of each of these points, for plotting onto the map
22+
1. `ST_DWithin`: used in the `WHERE` clause of the SQL query to constrain the results to points within 5km of the user's location
23+
1. `ST_MakePoint`: converts the longitude and latitude representing the user's location into a `POINT`
24+
1. A GIN index on the `ref_point` column in the `osm` table speeds up the calculation done by `ST_DWithin`
25+
26+
These types, operators, and the GIN index are familiar to users of
27+
[PostGIS](https://postgis.net/), the popular spatial extension available for
28+
PostgreSQL. In CockroachDB, this layer was created from scratch and PostGIS
29+
was not used, but the PostGIS API was preserved.
30+
31+
One aspect of CockroachDB's spatial capability is especially interesting: the
32+
way the spatial index works. In order to preserve CockroachDB's unique ability
33+
to scale horizontally by adding nodes to a running cluster, its approach to
34+
spatial indexing is to decompose of the space being indexed into buckets of
35+
various sizes. A deeper discussion of this topic is available
36+
[here](https://www.cockroachlabs.com/docs/v20.2/spatial-indexes).
637

738
<img src="./mobile_view.png" width="360" alt="Running on iPhone">
39+
(App running in an iPhone, in Safari)
840

941
## Setup
1042

43+
The demo can be run locally, in a Docker container, or in K8s.
44+
1145
[Data set](https://storage.googleapis.com/crl-goddard-gis/osm_1m_eu.txt.gz): 1m
1246
points from OpenStreetMap's Planet Dump, all in Europe
1347

deploy_k8s.sh

-30
This file was deleted.
File renamed without changes.
File renamed without changes.

k8s/deploy_k8s.sh

+66
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
#!/bin/bash
2+
3+
# 2 vCPU, 8 GB RAM, $0.075462/hour
4+
MACHINETYPE="e2-standard-2"
5+
NAME="${USER}-geo-tourist"
6+
ZONE="us-east4-b"
7+
8+
# Create the GKE K8s cluster
9+
gcloud container clusters create $NAME --zone=$ZONE --machine-type=$MACHINETYPE --num-nodes=4
10+
11+
ACCOUNT=$( gcloud info | perl -ne 'print "$1\n" if /^Account: \[([^@]+@[^\]]+)\]$/' )
12+
13+
kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=$ACCOUNT
14+
15+
# Create the CockroachDB cluster
16+
YAML="https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cockroachdb-statefulset.yaml"
17+
kubectl apply -f $YAML
18+
19+
# Initialize DB / cluster
20+
YAML="https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cluster-init.yaml"
21+
kubectl apply -f $YAML
22+
23+
cat <<EoM
24+
25+
WAIT until the output from "kubectl get pods" shows a status of "Running" for
26+
the three 'cockroachdb-N' nodes; e.g.
27+
28+
$ kubectl get pods
29+
NAME READY STATUS RESTARTS AGE
30+
cluster-init-67frx 0/1 Completed 0 8h
31+
cockroachdb-0 1/1 Running 0 8h
32+
cockroachdb-1 1/1 Running 0 8h
33+
cockroachdb-2 1/1 Running 0 8h
34+
35+
EoM
36+
37+
# Create table, index, and load data
38+
YAML="./data-loader.yaml"
39+
kubectl apply -f $YAML
40+
41+
cat <<EoM
42+
43+
WAIT until "kubectl get pods" shows "Completed" for the loader process; e.g.
44+
45+
$ kubectl get pods
46+
NAME READY STATUS RESTARTS AGE
47+
crdb-geo-loader 0/1 Completed 0 7h2m
48+
49+
EoM
50+
51+
# Start the Web UI
52+
YAML="./crdb-geo-tourist.yaml"
53+
kubectl apply -f $YAML
54+
55+
# Tear it all down
56+
YAML="./crdb-geo-tourist.yaml"
57+
kubectl delete -f $YAML
58+
YAML="./data-loader.yaml"
59+
kubectl delete -f $YAML
60+
YAML="https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cockroachdb-statefulset.yaml"
61+
kubectl delete -f $YAML
62+
YAML="https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cluster-init.yaml"
63+
kubectl delete -f $YAML
64+
65+
gcloud container clusters delete $NAME --zone=$ZONE --quiet
66+

osm/OSM_extracted_region.jpg

337 KB
Loading

osm/draw_osm_bounds.html

+69
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
<!DOCTYPE html>
2+
<html>
3+
<head>
4+
<title>Rectangles</title>
5+
<script src="https://polyfill.io/v3/polyfill.min.js?features=default"></script>
6+
<script
7+
src="https://maps.googleapis.com/maps/api/js?key=YOUR_GOOGLE_MAPS_API_KEY&callback=initMap&libraries=&v=weekly"
8+
defer
9+
></script>
10+
<script>
11+
google.maps.Polygon.prototype.getBounds = function() {
12+
var bounds = new google.maps.LatLngBounds();
13+
var paths = this.getPaths();
14+
var path;
15+
for (var i = 0; i < paths.getLength(); i++) {
16+
path = paths.getAt(i);
17+
for (var ii = 0; ii < path.getLength(); ii++) {
18+
bounds.extend(path.getAt(ii));
19+
}
20+
}
21+
return bounds;
22+
}
23+
</script>
24+
<style type="text/css">
25+
/* Always set the map height explicitly to define the size of the div
26+
* element that contains the map. */
27+
#map {
28+
height: 100%;
29+
}
30+
31+
/* Optional: Makes the sample page fill the window. */
32+
html,
33+
body {
34+
height: 100%;
35+
margin: 0;
36+
padding: 0;
37+
}
38+
</style>
39+
<script>
40+
// This example adds a red rectangle to a map.
41+
function initMap() {
42+
const map = new google.maps.Map(document.getElementById("map"), {
43+
zoom: 11,
44+
center: { lat: 52.68738, lng: 11.00000 },
45+
mapTypeId: "terrain",
46+
});
47+
const rectangle = new google.maps.Rectangle({
48+
strokeColor: "#2915a5",
49+
strokeOpacity: 0.8,
50+
strokeWeight: 2,
51+
fillColor: "#f0f0f0",
52+
fillOpacity: 0.2,
53+
map,
54+
bounds: {
55+
north: 72.253800,
56+
south: 33.120960,
57+
east: 34.225994,
58+
west: -12.666450,
59+
},
60+
});
61+
map.fitBounds(rectangle.getBounds());
62+
}
63+
</script>
64+
</head>
65+
<body>
66+
<div id="map"></div>
67+
</body>
68+
</html>
69+

osm_crdb.sql osm/osm_crdb.sql

File renamed without changes.

osm/osm_xml_to_json.py

+153
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
#!/usr/bin/env python3
2+
3+
"""
4+
5+
* Generate a geohash of the (lat, lon)
6+
* Add the geohash and some shortened versions to the JSON (See https://github.com/vinsci/geohash/)
7+
- pip3 install Geohash
8+
- geohash = Geohash.encode(lat, lon)
9+
- geohash precision: https://gis.stackexchange.com/questions/115280/what-is-the-precision-of-a-geohash
10+
* Store entire record as JSONB
11+
* Index the JSONB
12+
* Pull out id, timestamp, the 4-char geohash, and a GEOGRAPHY as separate columns
13+
* PK => (geohash4, id)
14+
* Pub review sources:
15+
- https://whatpub.com/
16+
17+
* ./osm_xml_to_json.py extracted.osm.bz2 5000000 5567.39s user 28.17s system 6% cpu 25:15:00.26 total
18+
* cockroach dump defaultdb osm_json --insecure 279.00s user 61.36s system 179% cpu 3:10.11 total
19+
* gzip - > backup.sql.gz 57.24s user 1.92s system 31% cpu 3:10.11 total
20+
21+
* DDL:
22+
23+
DROP TABLE IF EXISTS osm_json;
24+
CREATE TABLE osm_json
25+
(
26+
id STRING
27+
, geo_20km CHAR(4)
28+
, ts TIMESTAMP
29+
, x GEOGRAPHY
30+
, obj JSONB
31+
, PRIMARY KEY (geo_20km ASC, id ASC)
32+
);
33+
34+
"""
35+
36+
import sys
37+
import os
38+
import json
39+
import bz2
40+
import re
41+
import Geohash
42+
import psycopg2
43+
import psycopg2.errorcodes
44+
import psycopg2.extras
45+
import html
46+
import time
47+
48+
# Example input data
49+
"""
50+
<node id="114" version="4" timestamp="2018-07-21T22:01:43Z" uid="207581" user="Hjart" changeset="60940511" lat="59.9506757" lon="10.784339"/>
51+
<node id="115" version="3" timestamp="2018-07-21T22:01:43Z" uid="207581" user="Hjart" changeset="60940511" lat="59.9510531" lon="10.7796921"/>
52+
...
53+
<node id="108042" version="22" timestamp="2019-11-07T19:01:18Z" uid="2773866" user="kreuzschnabel" changeset="76773501" lat="51.5235613" lon="-0.1355134">
54+
<tag k="name" v="Simmons"/>
55+
<tag k="amenity" v="pub"/>
56+
<tag k="toilets" v="yes"/>
57+
<tag k="old_name" v="The Jeremy Bentham"/>
58+
<tag k="addr:street" v="University Street"/>
59+
<tag k="addr:postcode" v="WC1E 6JL"/>
60+
<tag k="contact:phone" v="+44 20 73771843"/>
61+
<tag k="opening_hours" v="Mo-We 16:00-23:30; Th-Fr 16:00-01:00; Sa 16:00-23:30"/>
62+
<tag k="contact:website" v="http://www.simmonsbar.co.uk/euston-square/4593769006"/>
63+
<tag k="addr:housenumber" v="31"/>
64+
</node>
65+
66+
"""
67+
68+
if len(sys.argv) != 3:
69+
print("Usage: %s osm_xml.bz2 max_points" % sys.argv[0])
70+
sys.exit(1)
71+
72+
in_file = sys.argv[1]
73+
max_points = int(sys.argv[2])
74+
75+
conn = psycopg2.connect(
76+
database=os.getenv("PGDATABASE", "defaultdb")
77+
, user=os.getenv("PGUSER", "root")
78+
, port=int(os.getenv("PGPORT", "26257"))
79+
, host=os.getenv("PGHOST", "localhost")
80+
, application_name="OSM JSON"
81+
)
82+
conn.autocommit = True
83+
84+
# rows is a list of lists
85+
def do_inserts (conn, rows):
86+
t0 = time.time()
87+
with conn.cursor() as cur:
88+
"""
89+
psycopg2.extras.execute_batch(cur, "INSERT INTO osm_json (id, geo_20km, ts, x, obj)"
90+
+ "VALUES (%s, %s, %s, ST_MakePoint(%s, %s)::GEOGRAPHY, %s::JSONB)",
91+
rows, page_size=100)
92+
"""
93+
print("INSERTED %d rows in %.2f seconds" % (len(rows), time.time() - t0))
94+
95+
n_read = 0
96+
kv = {}
97+
node = {}
98+
batch_size = 1000
99+
100+
# See above data examples for how this is derived
101+
node_pat = re.compile(r'<node id="([^"]+)" version="(\d+)" timestamp="([^"]+)" uid="(\d+)" user="([^"]+)" changeset="(\d+)" lat="(-?\d+\.\d+)" lon="(-?\d+\.\d+)">')
102+
tag_pat = re.compile(r'^<tag +k="([^"]+)" +v="([^"]+)" */>$')
103+
104+
rows = []
105+
with bz2.open(in_file, mode="rt", encoding="utf8", newline='\n') as f:
106+
while n_read < max_points:
107+
line = f.readline().strip()
108+
if line.startswith("</node>"):
109+
if not bool(node) or not bool(kv): # Is either empty?
110+
continue
111+
if "name" in kv: # I think it's interesting only is it has a name
112+
node["kv"] = kv
113+
rows.append([node["id"], node["geo_20km"], node["timestamp"], node["lon"], node["lat"], json.dumps(node)])
114+
if len(rows) == batch_size:
115+
do_inserts(conn, rows)
116+
rows = []
117+
#print(json.dumps(node))
118+
n_read += 1
119+
elif line.startswith("<node "):
120+
if line.endswith("/>"):
121+
continue
122+
node.clear()
123+
kv.clear()
124+
m = node_pat.match(line)
125+
if m is not None:
126+
node["id"] = m.group(1)
127+
node["version"] = m.group(2)
128+
node["timestamp"] = m.group(3)
129+
node["uid"] = m.group(4)
130+
node["user"] = html.unescape(m.group(5))
131+
node["changeset"] = m.group(6)
132+
lat = float(m.group(7))
133+
lon = float(m.group(8))
134+
node["lat"] = lat
135+
node["lon"] = lon
136+
geohash = Geohash.encode(lat, lon)
137+
# Add some geohash values. The "20km" suffix means it's accurate to +/- 20 kilometers
138+
node["geo_20m"] = geohash[0:8]
139+
node["geo_2400m"] = geohash[0:5]
140+
node["geo_20km"] = geohash[0:4]
141+
node["geo_80km"] = geohash[0:3]
142+
else:
143+
pass
144+
elif line.startswith("<tag "):
145+
m = tag_pat.match(line)
146+
if m is not None:
147+
kv[m.group(1)] = html.unescape(str(m.group(2)))
148+
else:
149+
pass
150+
if len(rows) > 0:
151+
do_inserts(conn, rows)
152+
conn.close()
153+

0 commit comments

Comments
 (0)