pyDAN : TAKING DATA TO ALGORITHMS

-----------------------------------------------

Author (contact): Anil Pratap Singh (singh.ap79@gm*l.*o*)

===========================

What is it?

The pyDAN is an ongoing attempt to make the power of SQL analytics available to data scientists and statisticians and minimize dependence over database engineer. The idea is to use all the tools at disposal namely SQL, Python, Java and Cython in one single framework and provide user with an unprecendented ability to pull, clean merge, munge and preanalyze data from multiple sources and stored across several platforms.

How do we do it?

The SQL standard a high level layer providing uniform (okay nearly uniform) interface to a large number of the database management systems. We propose to put up and additional layer (mostly in python) to relieve user from creating SQL (or most of it) code.

  ##Set the data source
  csv = pycsv_reader("mycsvfile.csv")
  remote = sql_reader("myurl","myuname","mypasswd")
##Read into sqlite scratch area
dset1 = csv.to_database(tabName="mycsvtab",database="mysqlite.db")
dset2 = csv.to_database(tabName="myremotetab",database="mysqlite.db"
##Let us do a join
data = dset1.join(resultTab = "mymergedtab",d2=dset2,
varNames1=['*'],
varNames2=['var2'],
on=["csv_var1=sql_var2"]
)
##Apply my business logic to var1, var2 and add result in var_X
data.addColumn("var_X","number","business_logic1",
bizLogicFunc,argNames=['var1','var2'])
##Dump the results into a csv file
data.to_csv("myresultdata.csv")

For every action requested on data, pyDAN generate appropriate SQL under the hood and executes it using pysqlite library. Each time a track is kept for the result tables (if any), and they too become available in program for futher manipulation.

##What do I need to learn ?

If you have beginner's skills at python and SQL, then its all you need to get started. Just know that in heart of pyDAN is the pydset class, which provides a pipe to an underlying tables in scratch sqlite. The pydset allows user to request maniputlations/queries on the data.

Table 1: A part of pydset interface.

pydset.subset
pydset.join
pydset.union
pydset.unpivot
pydset.apply
pydset.aggregate

pydset.describe
pydset.sum/total/count
pydset.head/tail/show
pydset.mean/std/quantile/median
pydset.get1/get2

ptdset.skeleton
pydset.to_numpy
pydset.to_csv
pydset.to_database
pydset.hist1D
pydset.graph1D

##How do I start? ------------------ The pyDAN project is still in infancy. The devel version can be obtained from this repository but there is no guarantee that it will work. A more stable and tested beta version can be obtained on request from the author. The beta version is supplied as a single python source file, and can be made available in your analysis by doing:

import sys
sys.path.insert(0,"C:\Users\SingAn22\Desktop\PYDAN\Beta")
from pydan import *

##What can I expect in future?

We are working on several channels

Enhancement possible formats for data input.
Enhancement of analytics library.
Enrichment of the SQL engine to automate generation of complex statements.
Enrichment of remote data processing abilities.
Enrichment of matplotlib interface for visualization.

##How can I contribute?

Well the first obvious step is to request a beta version and start using it. And then we can direct you to one of the several open lines of development at pyDAN.

##Disclaimer

The pyDAN is a private project and yet not released for general consumption. We can not guarantee support of back-version compatibility at the present moment but we do encourage people to start using the package as it matures into a release version.

#Tutorial 1: Data from Stock Market ##Introduction We use a file from Bombay Stock Exchange containing 3 years worth of daily listings for a particular stock. Using pyDAN we will demonstrate how to build a simple analytic routine. Here are some visualizations from analytics:

A Walkthrough

Libraries

  from pydan.pycsv import pycsv_reader
  from pydan.pytools import month_enum, print_error,clean_string
  from pypaint.pygraph import TGraph
from matplotlib import pyplot as plt
from matplotlib import ticker
from datetime import datetime

Load data into SQLITE scratchpad (data ingestion)

  ##Register the source csv
  csv = pycsv_reader("fullPathTo\\533098.csv")
##Load the data into sqlite database, obtain pydset object.
data = csv.to_database(tabName='pydanTable_prelim',database ='fullPathTo\dset.db')

Change the Date format (String Level Manipulation)

  #weave business requirements into a python function
  def refine_date(element):
    try:
      myDate = element.split('-')
      myMnth = '{:02d}'.format(int(month_enum(myDate[1])))
      myDay  = '{:02d}'.format(int(myDate[0]))
      myYear = myDate[2]
      newDate = '-'.join([myYear,myMnth,myDay])
      return newDate
    except Exception as err:
      print_error(err,'pydan_test1.refine_date')
#Apply the business requirement to one of the columns
data.apply(colName='Dat', funcName='to_date', func=refine_date, argNames=['Dat'])

Analyze data in each column and set types (Automated Data Recognition)

  #Analyze the data in columns for numericity
  colTypes = data.analyze_columns()
#transform columns to their proper datatypes.
dset = data.transform(resultTab='pydanTable',colTypes=colTypes)

Collapse the daily data to weekly level.

 #Creat logic for a unique marker for year-week combo
 def unqWeekMarker(date):
    year = int(datetime.strptime(str(date),'%Y-%m-%d').date().strftime('%Y'))
    week =  int(datetime.strptime(str(date),'%Y-%m-%d').date().strftime('%W'))
    u= 100*int(year)+1*int(week)
#Time to add this new column to the dset
dset.addColumn("unqWeek","NUMBER","to_unwk",uniqueWeek,argNames=['date'])
#Now is the time to collapse (weekly averages)
dt3 = dset.aggregate(resultTab='WEEKLY',aggOp=' AVG ',
varNames=['NoofShares','ClosePrice','TotalTurnoverRs','LowPrice','HighPrice','SpreadCloseOpen'],
groupBy=['unqWeek'],
orderBy = ['unqWeek'],
conditions = ['1=1']
)

Simple Visualization

  #Graph low price against week-id
  gr = dt3.graph1D(
        outfile='o.db',
        varNameX='unqWeek',
        varNameY='LowPrice', 
        plotName='lPrice_Graph',
        plotTitle='Low Price (Weekly)'
      )
#Graph hi price against week-id
gr2 = dt3.graph1D(
outfile='o.db',
varNameX='unqWeek',
varNameY='HighPrice',
plotName='hPrice_Graph',
plotTitle='High Price (Weekly)'
)
#Create the canvas
fig = plt.figure(facecolor='white')
ax  = fig.add_subplot(1,1,1)
gr.draw(ax)
gr2.draw(ax)

Cosmetic Settings

  #Logic to turn ugly weekId (201245) into beautiful market 2012-45
  def make_label(value,pos):
    return str(int(value/100))+"-"+str(int(value-int(value/100)*100))
#Set xaxis labels using this logic
ax.xaxis.set_major_formatter(ticker.FuncFormatter(make_label))
#Add a grid
ax.grid(True)

Finally plt.show()

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
pydan		pydan
README.md		README.md
figure_1.png		figure_1.png
figure_2.png		figure_2.png
pyDan.png		pyDan.png
pyDan1.PNG		pyDan1.PNG
pydan_header.PNG		pydan_header.PNG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pydan

pydan

README.md

README.md

figure_1.png

figure_1.png

figure_2.png

figure_2.png

pyDan.png

pyDan.png

pyDan1.PNG

pyDan1.PNG

pydan_header.PNG

pydan_header.PNG

Repository files navigation

pyDAN : TAKING DATA TO ALGORITHMS

What is it?

How do we do it?

##What do I need to learn ?

##What can I expect in future?

##How can I contribute?

##Disclaimer

A Walkthrough

About

Releases

Packages

Languages

AnilSingh79/pydan

Folders and files

Latest commit

History

Repository files navigation

pyDAN : TAKING DATA TO ALGORITHMS

What is it?

How do we do it?

##What do I need to learn ?

##What can I expect in future?

##How can I contribute?

##Disclaimer

A Walkthrough

About

Resources

Stars

Watchers

Forks

Languages