Skip to content

Airflow DAGs for exporting, loading, and parsing the ICON blockchain data.

License

Notifications You must be signed in to change notification settings

robcxyz/icon-etl-airflow

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

icon-etl-airflow

Contents

Usage

To use with terragrunt-icon-analytics:

  • copy contents of ./dags folder into /etc/airflow/dags and ensure airflow is the owner of files
sudo cp ./dags /etc/airflow/dags  # DAGs directory in ~/airflow/dags for docker based deployment
sudo chown -R airflow:airflow /etc/airflow/dags
  • install requirements as the airflow user
sudo su - airflow
pip3 install -r requirements.txt
exit
  • restart airflow-scheduler and airflow-webserver to load new packages
sudo systemctl restart airflow-scheduler
sudo systemctl restart airflow-webserver

# Or if using docker based deployment 
docker-compose down && docker-compose up -d 
  • Set mandatory variables in the Airflow UI.
  • configure postgres connection in connections tab

Variables

Project-wide

Variable Suggested Default Description
icon_cloud_provider aws The cloud provider you're using for Airflow: either 'aws' or 'gcp'
icon_database postgres The connection name for the database you're inserting data into
icon_output_bucket The name of the output bucket for block data
icon_provider_uris https://ctz.solidwallet.io/api/v3 The URI of the endpoint to use for block data
notification_emails List of email addresses to notify about job status

Export

Variable Default Description
icon_export_start_date 2018-01-24 The first date to start exporting
icon_export_schedule_interval 0 0 * * * The scheduled time for when the export job should run
icon_export_max_active_runs None The maximum number of active export runs
icon_export_max_workers 10 The maximum number of export workers per run
icon_export_batch_size 10 The export batch size
icon_export_retries 5 The number of retries per export run
icon_export_blocks_and_transactions_toggle True Enable exporting of blocks and transactions
icon_export_receipts_and_logs_toggle True Enable exporting of receipts and logs

Load

Variable Default Description
aws_access_key_id Access key ID to use for AWS Redshift
aws_secret_access_key Secret access key to use for AWS Redshift
load_all_partitions true Load all partitions in BigQuery
destination_dataset_project_id GCP Project ID for the BigQuery dataset

Schemas

Further information about the data used in the schemas is available in the ICON JSON RPC documentation.

Primary keys for each table are indicated in bold.

Caution should be exercised when using the timestamp field directly from the API. The time resolution of the epoch timestamp changes throughout. For ease of use, it is suggested to perform a transformation on this column in all tables to use only the left 16 digits (corresponding to the epoch timestamp in seconds).

A suggested transformation query is:

select left(timestamp::text, 10)::int8 as timestamp_s

and can be used to create a transformed column in a new table or materialized view.

Blocks

Field Type
number bigint
hash string
parent_hash string
merkle_root_hash string
timestamp bigint
version string
peer_id string
signature string
next_leader string

Transactions

Field Type
version string
from_address string
to_address string
value numeric(38,0)
step_limit numeric(38,0)
timestamp bigint
nid int
nonce numeric(100,0)
hash string
transaction_index bigint
block_hash string
block_number bigint
fee numeric(38,0)
signature string
data_type string
data string

Logs

Field Type
log_index int
transaction_hash string
transaction_index int
block_hash string
block_number int
address string
data string
indexed string

Receipts

Field Type
transaction_hash string
transaction_index int
block_hash string
block_number int
cumulative_step_used numeric(38,0)
step_used numeric(38,0)
step_price numeric(38,0)
score_address string
status string

Credits

ICON ETL Airflow was written by Richard Mah (@shinyfoil) for Insight Infrastructure, and was based on Ethereum ETL Airflow. The development was funded by the ICON Foundation.

About

Airflow DAGs for exporting, loading, and parsing the ICON blockchain data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%