Skip to content

salexkidd/PyHive

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyHive

PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.

Usage

DB-API

from pyhive import presto
cursor = presto.connect('localhost').cursor()
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10')
print cursor.fetchone()
print cursor.fetchall()

SQLAlchemy

First install this package to register it with SQLAlchemy (see setup.py).

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine('presto://localhost:8080/hive/default')
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
print select([func.count('*')], from_obj=logs).scalar()

Requirements

  • Python 2.7
  • For Presto: Just a Presto install
  • For Hive
    • HiveServer2 daemon
    • TCLIService (from Hive in /usr/lib/hive/lib/py)
    • thrift_sasl (from Cloudera)

Testing

Run the following in an environment with Hive/Presto:

./scripts/make_test_tables.sh
virtualenv env
source env/bin/activate
pip install -r dev_requirements.txt
py.test

WARNING: This drops/creates tables named one_row, one_row_complex, and many_rows.

About

Python interface to Hive and Presto.

Resources

License

Stars

Watchers

Forks

Packages

No packages published