Python ETLAmazon示例

编程语言: Python

命名空间/包名称: etl_amazon

方法/功能: ETLAmazon

hotexamples.com的示例: 3

Python ETLAmazon - 已找到3个示例。这些是从开源项目中提取的最受好评的etl_amazon.ETLAmazon现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

 def test_read_col(self):
     """ 
     Tests if spark context is able to read columns of json file 
     """
     etl = ea.ETLAmazon()
     test_file = etl.read_json("thenextbestbook/etl/tests/data/test_file.json.gz")
     test_file_pd = test_file.toPandas()
     test_file_truth = pd.read_json("thenextbestbook/etl/tests/data/test_file.json")
     self.assertEqual(test_file_pd.shape[1], test_file_truth.shape[1])

示例#2

显示文件

文件： test_etl_amazon.py 项目： apoorva-sh/thenextbestbook

 def test_sql_cmd(self):
     """ test if sql query is executing correctly on the dataset """
     etl = ea.ETLAmazon()
     runs = etl.read_json(
         "thenextbestbook/etl/tests/data/test_file.json.gz")
     runs.createGlobalTempView("runs")
     query_result = etl.sql_query("SELECT COUNT(*)" "FROM global_temp.runs")
     query_result_pd = query_result.toPandas()
     self.assertEqual(int(query_result_pd.iloc[0]), 263)

示例#3

显示文件

文件： etl_amazon_script.py 项目： kfrankc/thenextbestbook

""" script to run ETL on Amazon data """
import etl_amazon as ea
import constants as ct

# Initiate Spark Session
etl = ea.ETLAmazon()

# Create variable 'book' to store book review JSON object
books = etl.read_json(ct.AMAZON_BOOKS_JSON)

# Create variable 'metadata' to store metadata JSON object
metadata = etl.read_json(ct.AMAZON_METADATA_JSON)

# Create global variables for spark SQL command
books.createGlobalTempView("books")
metadata.createGlobalTempView("metadata")

# Create variable 'books_with_title' to store result of SQL
books_with_title = etl.get_title_on_asin()

# Save result to JSON folder
books_with_title.write.format('json').save(ct.AMAZON_REVIEWS_DESTINATION)