gfw-sync2

This is a suite of tools used to synchronize data for all websites in the GFW platform, including but not limited to the GFW Flagship, GFW Commodities, the Open Data Portal and various CartoDB and ArcGIS Server endpoints.

Updating a Layer

The data update process is driven by layers. Each layer has configuration options defined in the gfw-sync2 config table. When we have new source data for a layer (i.e. tiger conservation landscapes), we can update it across the platform by running:

python gfw-sync2 -e prod -l tiger_conservation_landscapes

This will take the options defined on the PROD tab of the config table and process the layer specified. The script will use the config table to do things like copy the data locally, apply a fieldmap, add a country code and then append it to various esri and CartoDB tables.

Global Datasets

In addition to processing input country datasets, this process will also update associated global datasets. Whenever a dataset of type country_vector is updated, the layer specified in the global_layer field will also be updated-- deleting the previous records for that country dataset, and appending the new data.

Automatic Updates

Other info: layers can be set to update automatically based on the update_days field. A nightly cronjob on the data management server (running utilities\cronjob.cmd) will compare today's date to the value in update_days to determine if the layer should be updated. Logs for these processes (and all updates) are written to the \logs dir (not included in this repo).

Config Table Fields

Attribute	Description
tech_title	Layer title
type	Must match the options defined in `layer_decision_tree.py`
add_country_value	ISO country code, required for `country_vector` layers
source	Path to the source dataset
transformation	Any transformations that need to be applied to the source
delete_features_input_where_clause	A where clause filter features from the source
merge_where_field	Will generate a list of values for a field (i.e. field: country, value: PER) in the source table and delete all records in `esri_service_output` and `cartodb_service_output` datasets with that value, then append the source. If nothing specified, will truncate the output data and then append
esri_service_output	esri output to append the source to
cartodb_service_output	cartoDB output to append to
archive_output	path to the output archive ZIP created
download_output	path to the download ZIP created
field_map	A .ini file used to map fields from source to outputs
tile_cache_output	location for storage of tile cache generated
update_days	numeric days of the month to check for updates. Can be `[1-10]` (run on all days 1-10) or `[1,5,10]`, (run on the 1st, 5th, and 10th of each month).
global_layer	If this dataset is part of global layer, specify it's `tech_title` here
last_updated	Automatically updated by the script when a layer is updated

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
config		config
datasources		datasources
layers		layers
metadata		metadata
postprocess		postprocess
utilities		utilities
.gitignore		.gitignore
README.md		README.md
gfw-sync.py		gfw-sync.py
layer_decision_tree.py		layer_decision_tree.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

datasources

datasources

layers

layers

metadata

metadata

postprocess

postprocess

utilities

utilities

.gitignore

.gitignore

README.md

README.md

gfw-sync.py

gfw-sync.py

layer_decision_tree.py

layer_decision_tree.py

Repository files navigation

gfw-sync2

Updating a Layer

Global Datasets

Automatic Updates

Config Table Fields

About

Releases

Packages

Languages

deekonger/gfw-sync2

Folders and files

Latest commit

History

Repository files navigation

gfw-sync2

Updating a Layer

Global Datasets

Automatic Updates

Config Table Fields

About

Resources

Stars

Watchers

Forks

Languages