Skip to content

adamhooper/opencensus

Repository files navigation

== How I did certain things

=== The "regions" table

First, imported tables into "regions". Used ogr2ogr to import individual census files into their own tables, and then ran INSERT INTO ... SELECT ... to query from those individual tables into the "regions" table.

There's one problem: Divisions don't have EconomicRegion parents, even though StatsCan's hierarchy diagram says they should. The reason is one exception: Halton, ON (west of Mississauga), which straddles two EconomicRegions. We'll set the parent to a random one of those two, since following StatsCan's hierarchy makes the rest of the code quicker and easier. UPDATE regions SET economic_region_uid = (SELECT MAX(r2.economic_region_uid) FROM regions r2 WHERE r2.type = 'Subdivision' AND r2.division_uid = regions.uid) WHERE type = 'Division';

The "regions" table is our canonical data source. It's slow to create tiles from it, though. So we do speedups.

==== Speedups: the "region_polygons" table

When fetching a tile, we need to quickly find a pre-rendered list of polygons at the given extent.

To create "region_polygons": `INSERT INTO region_polygons (region_id, polygon) SELECT id, ST_Dump(geography).geom FROM regions;`. For any MultiPolygon, the ST_Dump() will create one row in region_polygons for each inner polygon.

Add an area, in square metres. `UPDATE region_polygons SET area_in_m = ST_Area(ST_SetSRID(polygon, 4326));`

Add is_island, because it's helfpul for making the map look right. If an island takes up 40 pixels, people will see it and want to see data on it; for a 40-pixel polygon that isn't an island, it might be okay to show only the polygon's parent's data.

With is_island and area_in_m, we can decide which polygons to show at which zoom levels:

1. Set min_zoom_level for every polygon, up to 15 (because 15 is our max zoom level). Do this by finding out how many pixels the polygon would fill at that zoom level, and turning it on if it fills enough.
2. Set the same min_zoom_level for all polygons with the same parent region. That means all or no tracts within a metropolitan area will appear, for instance, or all or no dissemination blocks within an area.

Why #1? Because we don't want every polygon showing up at every zoom level. Why #2? Because we don't want only *some* sub-regions of a given region showing up. (Also, we don't want only some *polygons* of a single region showing up, and #2 happens to solve that problem, too.)

With that, we know which polygons we want to render at which zoom levels. Next, we need to render those polygons into tiles. See tile_renderer/ for that.

Then simplify, adding to separate tables so that neighbouring tiles use rows from the same general spot: `SELECT id, region_id, min_longitude, max_longitude, min_latitude, max_latitude, area_in_m, is_island, ST_SimplifyPreserveTopology(polygon, 0.46875000000000000000000) INTO region_polygons_zoom18';

Finally, add indices: on boundaries, area_in_m and is_island. These are what we'll use in our query conditions.

The end result? We can select the appropriate row and return the geometry with ST_AsGeoJSON(). Most of the calculations are already done.

==== Validity

Sometimes ST_SimplifyPreserveGeometry() introduces errors. These will give "side location conflict" errors when using ST_Intersection(). Fix the the errors with ST_Buffer(geometry, 0.0). For instance: `UPDATE region_polygons_zoom18 SET polygon = ST_Buffer(polygon, 0) WHERE ST_IsValid(polygon) IS FALSE`

=== Data

* Download data into CSV format (for 2001, where the only public format is HTML) via script/scrape/
* Import CSVs using script/import_all_csvs
* Download 2006 data from the URL shown in script/import-2006-popdwellings, and use that same script to import the .txt file
* Run script/sanitize_indicators to fix un-sane values
* Run script/preprocess_indicators to calculate custom indicators: population/area, dwellings/area, etc.

=== Tiles

See tile_renderer/README.