Skip to content

vbalasu/trifacta

Repository files navigation

trifacta

Trifacta client that makes it easy to integrate Trifacta into your production and data science workflows

Usage Scenarios

  • Jupyter: Invoke Trifacta jobs from a Jupyter notebook and pass data back and forth between Jupyter and Trifacta
  • Other Notebooks: Integrate Trifacta with Azure Databricks, Zepellin or any other notebook-style interface that supports Python
  • Scripts: Automate Trifacta jobs and input/output using python scripts that can be easily executed from the command line or called from an external scheduler

Functionality

This library makes it simple to do the following:

  1. Connect to a Trifacta instance
  2. Run a job
  3. Download results to a csv file and view in pandas dataframe

Note that file uploads and downloads are performed using Amazon S3, using the boto3 API

#!pip install trifacta
import trifacta

If you need an access token, you can generate it as follows:

#Step 1: Connect to Trifacta by providing the URL and API Access Token
t = trifacta.Client('http://partnerdemo.amer.trifacta.net:3005', 'YOUR_ACCESS_TOKEN')

Get the wrangled dataset id from the URL in the Trifacta UI

Make sure that you have run the job manually at least once Edit recipe

Note the output path (be sure to set it to "replace")

Publish settings

#Step 2: Run the job
t.run_job(23)
About to run job
{'sessionId': '9d339e65-8898-4165-871b-b9db848dc099', 'reason': 'JobStarted', 'jobGraph': {'vertices': [76, 77], 'edges': [{'source': 76, 'target': 77}]}, 'id': 42, 'jobs': {'data': [{'id': 76}, {'id': 77}]}}
2020-02-25 11:19:58.508231 InProgress
2020-02-25 11:20:03.700189 InProgress
2020-02-25 11:20:08.887794 Complete





True
%env AWS_PROFILE=trifacta_master_trial
env: AWS_PROFILE=trifacta_master_trial
#Step 3: Download results to a csv file and view in pandas dataframe
import boto3
s3 = boto3.client('s3', region_name='us-west-2')
s3.download_file(Bucket='trifacta-partnerdemo-trifactabucket-kkcpnw234feu',
                Key='trifacta/queryResults/admin@trifacta.local/MarketingAnalytics.csv',
                Filename='MarketingAnalytics.csv')
import pandas as pd
df = pd.read_csv('MarketingAnalytics.csv')
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
user_id customerkey event_type event_subtype Date advertiser_id creative_id url product_id domain_url ... customeraccount_number customerphone customeraddress cusotmerstate customerzipcode customercountry socialmedia totalsale Outlier_Identifier currencykey
0 1126310400000-424 1126310400000-424 click click 10-19-2005 164332 543027 http://zdnet.com/praesent/lectus/vestibulum/qu... 1124064000000-475 zdnet ... 310170445527596 (817)718-7309 156 Cozy Berry Arc CA 78710 USA deneleaf 7004.54 False 1
1 1229126400000-20 1229126400000-20 click click 08-17-2009 164332 252030 http://hostgator.com/a/feugiat.js?pid=12331008... 1233100800000-528 hostgator ... 310150240507900 (469)201-1812 3641 Euismod Avenue CA 10769 USA kinphanng 4853.35 False 1
2 1126828800000-518 1126828800000-518 view view 04-05-2006 164332 562765 http://fc2.com/convallis/duis/consequat/dui/ne... 1121904000000-509 fc2 ... 310170133079761 (443)585-1769 Ap #543-7410 Accumsan Rd. CA 92845 USA waldeelbailarin 6885.15 False 1
3 1130112000000-336 1130112000000-336 click click 04-05-2006 164332 466942 http://biblegateway.com/est/phasellus/sit/amet... 1130284800000-343 biblegateway ... 310120073380564 (215)669-3055 900-8123 Aliquam Av. CA 85517 USA charlrey 2593.31 False 1
4 1121990400000-216 1121990400000-216 view view 09-27-2005 164332 400316 https://zdnet.com/elementum/nullam/varius/null... 1108339200000-416 zdnet ... 310160496868669 301 742 1112 164 Cozy Anchor Rd CA 60101 USA scottylago 3958.25 False 1

5 rows × 31 columns

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published