Author (contact): Anil Pratap Singh (singh.ap79@gm*l.*o*)=========================== The pyDAN is an ongoing attempt to make the power of SQL analytics available to data scientists and statisticians and minimize dependence over database engineer. The idea is to use all the tools at disposal namely SQL, Python, Java and Cython in one single framework and provide user with an unprecendented ability to pull, clean merge, munge and preanalyze data from multiple sources and stored across several platforms. The SQL standard a high level layer providing uniform (okay nearly uniform) interface to a large number of the database management systems. We propose to put up and additional layer (mostly in python) to relieve user from creating SQL (or most of it) code.
If you have beginner's skills at python and SQL, then its all you need to get started. Just know that in heart of pyDAN is the pydset
class, which provides a pipe to an underlying tables in scratch sqlite. The pydset
allows user to request maniputlations/queries on the data.
pydset.subset pydset.join pydset.union pydset.unpivot pydset.apply pydset.aggregate |
pydset.describe pydset.sum/total/count pydset.head/tail/show pydset.mean/std/quantile/median pydset.get1/get2 |
ptdset.skeleton pydset.to_numpy pydset.to_csv pydset.to_database pydset.hist1D pydset.graph1D |
import sys
sys.path.insert(0,"C:\Users\SingAn22\Desktop\PYDAN\Beta")
from pydan import *
We are working on several channels
- Enhancement possible formats for data input.
- Enhancement of analytics library.
- Enrichment of the SQL engine to automate generation of complex statements.
- Enrichment of remote data processing abilities.
- Enrichment of matplotlib interface for visualization.
Well the first obvious step is to request a beta version and start using it. And then we can direct you to one of the several open lines of development at pyDAN.
The pyDAN is a private project and yet not released for general consumption. We can not guarantee support of back-version compatibility at the present moment but we do encourage people to start using the package as it matures into a release version.
#Tutorial 1: Data from Stock Market ##Introduction We use a file from Bombay Stock Exchange containing 3 years worth of daily listings for a particular stock. Using pyDAN we will demonstrate how to build a simple analytic routine. Here are some visualizations from analytics:
- Libraries
from pydan.pycsv import pycsv_reader from pydan.pytools import month_enum, print_error,clean_string from pypaint.pygraph import TGraph
from matplotlib import pyplot as plt from matplotlib import ticker from datetime import datetime
- Load data into SQLITE scratchpad (data ingestion)
##Register the source csv csv = pycsv_reader("fullPathTo\\533098.csv")
##Load the data into sqlite database, obtain pydset object. data = csv.to_database(tabName='pydanTable_prelim',database ='fullPathTo\dset.db')
- Change the Date format (String Level Manipulation)
#weave business requirements into a python function def refine_date(element): try: myDate = element.split('-') myMnth = '{:02d}'.format(int(month_enum(myDate[1]))) myDay = '{:02d}'.format(int(myDate[0])) myYear = myDate[2] newDate = '-'.join([myYear,myMnth,myDay]) return newDate except Exception as err: print_error(err,'pydan_test1.refine_date')
#Apply the business requirement to one of the columns data.apply(colName='Dat', funcName='to_date', func=refine_date, argNames=['Dat'])
- Analyze data in each column and set types (Automated Data Recognition)
#Analyze the data in columns for numericity colTypes = data.analyze_columns()
#transform columns to their proper datatypes. dset = data.transform(resultTab='pydanTable',colTypes=colTypes)
- Collapse the daily data to weekly level.
#Creat logic for a unique marker for year-week combo def unqWeekMarker(date): year = int(datetime.strptime(str(date),'%Y-%m-%d').date().strftime('%Y')) week = int(datetime.strptime(str(date),'%Y-%m-%d').date().strftime('%W')) u= 100*int(year)+1*int(week)
#Time to add this new column to the dset dset.addColumn("unqWeek","NUMBER","to_unwk",uniqueWeek,argNames=['date'])
#Now is the time to collapse (weekly averages) dt3 = dset.aggregate(resultTab='WEEKLY',aggOp=' AVG ', varNames=['NoofShares','ClosePrice','TotalTurnoverRs','LowPrice','HighPrice','SpreadCloseOpen'], groupBy=['unqWeek'], orderBy = ['unqWeek'], conditions = ['1=1'] )
- Simple Visualization
#Graph low price against week-id gr = dt3.graph1D( outfile='o.db', varNameX='unqWeek', varNameY='LowPrice', plotName='lPrice_Graph', plotTitle='Low Price (Weekly)' )
#Graph hi price against week-id gr2 = dt3.graph1D( outfile='o.db', varNameX='unqWeek', varNameY='HighPrice', plotName='hPrice_Graph', plotTitle='High Price (Weekly)' )
#Create the canvas fig = plt.figure(facecolor='white') ax = fig.add_subplot(1,1,1) gr.draw(ax) gr2.draw(ax)
- Cosmetic Settings
#Logic to turn ugly weekId (201245) into beautiful market 2012-45 def make_label(value,pos): return str(int(value/100))+"-"+str(int(value-int(value/100)*100))
#Set xaxis labels using this logic ax.xaxis.set_major_formatter(ticker.FuncFormatter(make_label))
#Add a grid ax.grid(True)
- Finally
plt.show()