A personal project to implement Data-mining algorithms in python
- sub_series
- cov_series
- create_cov_matrix
- _eigenvalue
- get_eigenvalue
- get_pca
- correlation_series
- correlation_frame
- Tests: test.py
- Gets a pandas series, calculates average (mean) and returns result subtraction of the series and the average
- for example:
sr = [3, 4, 1, 2, 0]
sr.mean = (3+4+1+2+0)/sr.len = 10/5 = 2
result = [3-2, 4-2, 1-2, 2-2, 0-2] = [1, 2, -1, 0, -2]
- Gets to pandas series and returns covariance of them
- Covariance Formula:
- Length of sr1 & sr2 must be equal
- for example:
sr1 = pd.Series([3, 4, 1, 2, 0])
sr1 - sr1.mean = 1, 2, -1, 0, -2
sr2 = pd.Series([1, 3, 0, 4, 2])
sr2 - sr2.mean = -1, 1, -2, 2, 0
result = ((1*-1) + (21) + (-1-2) + (02) + (-20)) / sr1.len
= (-1 + 2 + 2 + 0 + 0) / 5 = 3 / 5 = 0.6
Creates a Covariance Matrix from a matrix (pandas.Dataframe)
Covariance Matrix:
0 | 1 | ... | n | |
---|---|---|---|---|
1 | Cov00 | Cov01 | Cov0n | |
... | ||||
n | Covn0 | Covn1 | ... | Covnn |
n is number od columns
Covij is Covariance of columni & columnj
for example:
input matrix:
1 | 2 | |
---|---|---|
0 | 3 | 1 |
1 | 4 | 3 |
2 | 1 | 0 |
3 | 2 | 4 |
4 | 0 | 2 |
output:
1 | 2 | |
---|---|---|
1 | 2 | 0.6 |
2 | 0.6 | 2 |
- Gets a Matrix (pandas.Dataframe) and returns descending sorted eigenvalue & eigenvectors
- This function using numpy.linalg.eigh
- Gets a matrix (pandas.Dataframe)
- Transforms it to its Covariance Matrix (using create_cov_matrix)
- Returns 'eigenvalue' and 'eigenvectors' (using _eigenvalue)
Gets a Matrix (pandas.Dataframe) and returns the PCA
Gets a matrix (pandas.Dataframe) and returns correlation matrix of its columns