Skip to content

starrovoyt/simple_anomaly_detector_via_pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple anomaly detector via PySpark

Tiny description:

This program contains algorithm of distributed anomaly detection in such user identifiers as IDFA (iOS advertising ID gererated as uuid), GAID (Android advertising ID gererated as uuid), email, and login in map-reduce paradigm.

Idea: Generate factors, based only on representation of identifier and number of its connections with other identifiers.

How to run:

Main module – anomaly_detection/main.py

Need to export: SPARK_MASTER, DATA_DIR, EDGES_TABLE_NAME

where

DATA_DIR – full path to directory with stored tables with data, EDGES_TABLE_NAME – just name of table where you store edges data.

About

Distributed anomaly detector in UUIDs and emails/logins

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published