Language-integrated querying for Python 3 using the BOLDR framework.
This project is the result of a 6-week internship at the LRI in Orsay, in which I was tasked with with building a Python library that would allow Python developers to write database queries using idiomatic Python constructs such as comprehensions and user-defined functions.
For instance, for this user-defined function:
def at_least(salary):
return [{'name': e.name}
for e in employees
if e.salary < salary]
This library would generate the following SQL query:
SELECT e.name
FROM employees AS e
WHERE e.salary < {salary}
Where {salary}
acts as a placeholder for the value of salary
provided to the function.
Under the hood, the qir
module uses introspection to translate Python code into an intermediate representation –the QIR– at runtime. This representation is then translated into the desired query language (SQL, HiveQL, JSON, etc.) using the BOLDR framework (which is being built at the LRI).
Introspection is done through CPython's dis
module, which allows the inspection of the bytecode of any function at runtime. This bytecode is then converted into a QIR term using a symbolic stack machine described in the internship report. I chose to use dis
over inspect
as it handles anonymous functions much better, and because, let's face it, it's quite funny trying to translate bytecode into a lambda-calculus-like representation.
Communication between the Python client and the BOLDR server is achieved using Protocol Buffers and gRPC.
The share
directory contains:
- a 20-page internship report that explains the whole translation procedure in details;
- a Beamer presentation –written in French– that summarizes the internship.