`select_py`

WARNING: I wrote this a long time ago and it does not represent my current abilities :D

A simple Python interface to SQL: Launch database queries (from a remote server if needed) and dump results into a tab-delimted text file on your machine.

Typical use-cases:

You want to edit your SQL in an IDE locally.
You want to store the results of a query in a CSV, so you can load it up in Excel, iPython, etc. for analysis.

NEW: Support for Cloudera Impala.

Why do you care?

There's a standard procedure for storing Hive output on your machine: scp a query file to a server where the SQL can be executed, ssh to the server, execute job so that output is saved to file, scp output file to local machine, ssh back home. Why write code to do this more than once?
I use the same flags every time I write a script calling the SQL Server command line utility or the Impala CLI.
Handling job error and cleaning the output of a SQL Server job executed via command line is annoying.

How do you use it?

select_py.Hive, select_py.SQLServer, and select_py.Impala take the following arguments -- all strings -- upon initialization:

Argument	Description
query	Text of the query
server	Server from which the SQL will be executed
outfile	Path of the file in which output will be stored

They each have the following methods:

Method	Description
.format(args,*kwargs)	Format the query string, using Python's `string.format()`
.execute()	Execute the query
.output_summary()	Applies `os.stat` to the output file

Here's a simple example:

$ python
>>> import datetime as dt
>>> from select_py import Hive
>>> q = """ -- show column names in output
  set hive.cli.print.header=true;
  select *
  from some_table
  where date >= \'{date}\'
  limit 10"""
>>> r = "hadoop_server"
>>> o = "my_output_file.tsv"
>>> hive_job = Hive(q,r,o)
>>> hive_job.format(date = str(dt.date.today() - dt.timedelta(days=3)))
>>> hive_job.execute()
... Hive's stderr prints to screen ...
>>> # Let's check that the output file is non-empty
>>> hive_job.output_summary().st_size
12345
>>> # Hooray!

Want to write your own API for a different SQL engine?

Given any other SQL software which has a command line interface, it should be easy to write an associated child of select_py.Base providing the same API. The code for the child will need a custom...

cls.CMD: A shell command in a string, which will be formatted according to self.__init__ parameters.
self._set_up: Code to be executed before the SQL executes, eg. sending temporary files to a server.
self._tear_down: Code to be executed after the SQL executes, eg. deleting temporary files.

(potential) Issues

In order for Hive and Impala to run properly, your SSH must be configured so that the server at which you normally run Hive queries can be accessed via SSH without a password/passphrase promt.
SQLServer requires Windows Authenification for access to SQL Server.
I wrote this for myself to use with Cygwin on Windows (lol), and haven't tested it out in different environments.

to do

server arg should have a None option
outfile option should allow writing to STDOUT or None
Add SQLite implementation.
Add a factory function allowing you to choose the SQL flavor.
Make a docopt-powered CLI
Set headers as default in Hive
Rename resource arg as server in __init__
Add the usual packaging with setup tools.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
lib		lib
tests		tests
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib

lib

tests

tests

.gitignore

.gitignore

README.md

README.md

init.py

init.py

Repository files navigation

`select_py`

Why do you care?

How do you use it?

Want to write your own API for a different SQL engine?

(potential) Issues

to do

About

Releases

Packages

Languages

RAvdek/select_py

Folders and files

Latest commit

History

Repository files navigation

select_py

Why do you care?

How do you use it?

Want to write your own API for a different SQL engine?

(potential) Issues

to do

About

Resources

Stars

Watchers

Forks

Languages

`select_py`