PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.
DB-API -----.. code-block:: python
from pyhive import presto cursor = presto.connect('localhost').cursor() cursor.execute('SELECT * FROM my_awesome_data LIMIT 10') print cursor.fetchone() print cursor.fetchall()
First install this package to register it with SQLAlchemy (see setup.py
).
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine('presto://localhost:8080/hive/default')
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
print select([func.count('*')], from_obj=logs).scalar()
- Python 2.7
- For Presto: Just a Presto install
- For Hive
- HiveServer2 daemon
TCLIService
(from Hive in/usr/lib/hive/lib/py
)thrift_sasl
(from Cloudera)
Run the following in an environment with Hive/Presto:
./scripts/make_test_tables.sh
virtualenv env
source env/bin/activate
pip install -r test_requirements.txt
nosetests
WARNING: This drops/creates tables named one_row
, one_row_complex
, and many_rows
.