Skip to content

mseyfayi/datamining_course_utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-mining Course Utils

A personal project to implement Data-mining algorithms in python

Functions

sub_series

  • Gets a pandas series, calculates average (mean) and returns result subtraction of the series and the average
  • for example:

sr = [3, 4, 1, 2, 0]
sr.mean = (3+4+1+2+0)/sr.len = 10/5 = 2
result = [3-2, 4-2, 1-2, 2-2, 0-2] = [1, 2, -1, 0, -2]

cov_series

  • Gets to pandas series and returns covariance of them
  • Covariance Formula:
    Covariance formula
  • Length of sr1 & sr2 must be equal
  • for example:

sr1 = pd.Series([3, 4, 1, 2, 0])
sr1 - sr1.mean = 1, 2, -1, 0, -2
sr2 = pd.Series([1, 3, 0, 4, 2])
sr2 - sr2.mean = -1, 1, -2, 2, 0
result = ((1*-1) + (21) + (-1-2) + (02) + (-20)) / sr1.len
          = (-1 + 2 + 2 + 0 + 0) / 5 = 3 / 5 = 0.6

create_cov_matrix

Creates a Covariance Matrix from a matrix (pandas.Dataframe)

Covariance Matrix:

0 1 ... n
1 Cov00 Cov01 Cov0n
...
n Covn0 Covn1 ... Covnn

n is number od columns
Covij is Covariance of columni & columnj

for example:
input matrix:

1 2
0 3 1
1 4 3
2 1 0
3 2 4
4 0 2

output:

1 2
1 2 0.6
2 0.6 2

_eigenvalue

get_eigenvalue

  1. Gets a matrix (pandas.Dataframe)
  2. Transforms it to its Covariance Matrix (using create_cov_matrix)
  3. Returns 'eigenvalue' and 'eigenvectors' (using _eigenvalue)

get_pca

Gets a Matrix (pandas.Dataframe) and returns the PCA

correlation_series

  • Gets two pandas.series and calculates their Pearson Correlation
  • Pearson Correlation Formula:
    Covariance formula

correlation_frame

Gets a matrix (pandas.Dataframe) and returns correlation matrix of its columns

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages