OSM Changeset Analyser, osmcha
, is a Python package to detect suspicious OSM changesets. It was designed to be used with osmcha-django, but also can be used standalone or in other projects.
pip install osmcha
You can read a replication changeset file directly from the web:
c = ChangesetList('https://planet.openstreetmap.org/replication/changesets/002/236/374.osm.gz')
or from your local filesystem.
c = ChangesetList('tests/245.osm.gz')
c.changesets
will return a list containing data of all the changesets listed in the file.
You can filter the changesets passing a GeoJSON file with a polygon with your interest area to ChangesetList as the second argument.
Finally, to analyse an especific changeset, do:
ch = Analyse(changeset_id)
ch.full_analysis()
You can customize the detection rules by defining your prefered values when initializing the Analyze
class. See below the default values.
ch = Analyse(changeset_id, create_threshold=200, modify_threshold=200,
delete_threshold=30, percentage=0.7, top_threshold=1000,
suspect_words=[...], illegal_sources=[...], excluded_words=[...])
The command line interface can be used to verify an especific changeset directly from the terminal.
Usage: osmcha <changeset_id>
osmcha
works by analysing how many map features the changeset created, modified or deleted, and by verifying the presence of some suspect words in the comment
, source
and imagery_used
fields of the changeset. Furthermore, we also consider if the software editor used allows to import data or to do mass edits. We consider powerfull editors
: JOSM, Merkaartor, level0, QGIS and ArcGis.
In the Usage
section, you can see how to customize some of these detection rules.
We tag a changeset as a possible import
if the number of created elements is greater than 70% of the sum of elements created, modified and deleted and if it creates more than 1000 elements or 200 elements case it used one of the powerfull editors
.
We consider a changeset as a mass modification
if the number of modified elements is greater than 70% of the sum of elements created, modified and deleted and if it modifies more than 200 elements.
All changesets that delete more than 1000 elements are considered a mass deletion
. If the changeset deletes between 200 and 1000 elements and the number of deleted elements is greater than 70% of the sum of elements created, modified and deleted it's also tagged as a mass deletion
.
The suspect words are loaded from a yaml file. You can customize the words by setting another default file with a environment variable:
or pass a list of words to the Analyse
class, more information on the section Customizing Detection Rules
. We use a list of illegal sources to analyse the source
and imagery_used
fields and another more general list to examine the comment field. We have also a list of excluded words to avoid false positives.
Verify the changesets made in iD editor to check the host instance. The trusted iD instances are: OSM.org, Strava and ImproveOSM.
To run the tests on `osmcha`:
git clone https://github.com/willemarcel/osmcha.git
cd osmcha
pip install -e .[test]
py.test -v
Check CHANGELOG.RST for the version history.
GPLv3