GitHub - asksmruti/s3-monitoring: Monitor S3 data arrival (Glue/Athena)

S3 bucket monitoring

The attached code solution is based on a problem scenario which may helpful in some cases

Problem

To monitor whether there is a new file added to s3 bucket.

Which buckets to monitor :
- Only the bucket where data exists and queryable from athena
How to get the bucket prefix :
- From Glue metastore
How to get latest data arrival date :
- In some cases glue metastore may not have information about arrival of data if the crawler did not run, So paginator option of boto3 is being used. This can also be done through Amazon S3 inventory if the number objects are very huge.
What it will do :
- If it finds any s3 prefix which has not been updated with latest file then it will generate a report
- Report can accessible through rest, dashboard and email notification
- Scheduler can be enabled to send email notification
- Every hit to API will refresh the data
- The logger has also been enabled for monitoring purpose
How to run :
- Set the aws profile, email server details
- Install the pre-requisites pip3 install requirements.txt
- run python3 main.py

Probably need to add one more caching layer in-case the number objects are huge

How to run exporter

Please run the s3-exporter

python3 s3-exporter.py

Metrics endpoint:

http://localhost:8000/

Sample prometheus config setup

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'dl-monitoring'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['127.0.0.1:8000']

  - job_name: 'node-exporter'

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
conf		conf
lib		lib
logs		logs
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
main.py		main.py
requirements.txt		requirements.txt
s3-exporter.py		s3-exporter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

conf

conf

lib

lib

logs

logs

.gitignore

.gitignore

README.md

README.md

init.py

init.py

main.py

main.py

requirements.txt

requirements.txt

s3-exporter.py

s3-exporter.py

Repository files navigation

S3 bucket monitoring

Problem

How to run exporter

About

Releases

Packages

Languages

asksmruti/s3-monitoring

Folders and files

Latest commit

History

Repository files navigation

S3 bucket monitoring

Problem

How to run exporter

About

Resources

Stars

Watchers

Forks

Languages