Kin-Keepers

A repository that contains work done for two data science projects at Kin Keepers. I've uploaded this with permission from my supervisor, Mr. Juan Jimenez.

Note: this project was migrated into this repository with some modification, please note that some of the content has been obfuscated on purpose.

Problem Description:

There currently exists an Arduino that constantly sends movement data (accelerometer and gyroscopic) to be written onto a CSV file. Much of this data is uninformative, since it just shows that the user is stationary. The goals of this project are as follows:

find thresholds that partition movement data into 'significant' and 'non-significant' (using machine learning techniques or otherwise)
use this 'filtered' data to find anomalies in time-series acceleration and gyration data (i.e. if the user has lower acceleration / gyration on average then one should trigger an alarm) using machine learning methods or otherwise

A vote was taken on what movements were 'significant' and which were non-significant. The results are shown below:

Significant data:

standing up and sitting down straight after
fetching remote control that is not at an arm's reach
lifting and lowering one's lower leg to stimulate blood circulation
walking
falling

Non-significant data:

crossing one's arms
crossing one's legs
switching seating positions
moving from a sitting position to a lying one

Note that the above were used as 'testing criteria' for the clustering model developed.

Note that the first problem (clustering) can be found within the sub-directory 'Clustering movement data'. The second problem on anomaly detection can be found within sub-dir 'Anomaly Detection in Significant Movements'. These sub-dir contain summary documents (called Summary.ipnyb) that summarise the outcomes of the project. The most signifiant lesson learnt is that the data collection was not great, due to internal blockers met by the team.

Contents of repository:

Files:

requirements.txt: includes the requirements for the project. Note that I have Jupyter notebooks installed, which by default install a number of different libraries. Not all of these are necessary, but to stay safe, it's best to install the items in the requirements.txt using

reproduce_project.pdf: a guide to reproducing the project. Note that this should be used in conjunction with both the Summary.ipynb notebooks (or their PDF versions)

reproduce_project.pages: same as the above but an apple.pages file

Directories:

Data: directory containing all the data used for either research, flow detection project or the movement data project. The data for this has been generated by another team member.

Filtered_data: contains the total data generated after filtering thresholds have been applied
FlowSimulator: includes data from the flow simulation.
Ignacio: contains all the data generated by Ignacio (.csv). Note that the filename represents the date data was generated, and a string '_repet' is added for those with inconsistent headers.
Research data: this is data that was used for research purposes prior to project start.
Rohan: same as Ignacio sub-directory, except data generated by Rohan.
TestData: this represents 'individual-movements' generated by Ignacio, used to test the algorithm finding the filter thresholds

Models: includes ML models saved as .pkl files

Research: includes work done while I had no access to data (i.e. preparation for when I did actually get the data). Note that some of the work is incomplete.

Movement Clustering movement data: This folder The first stage of the project involved actually clustering this data to separate 'significant' movements from stationary datapoints. The first couple of notebooks are exploratory, then there is modelling, verification and finally finding numerical thresholds to be incorporated on the arduino itself (so that there is no constant data transfer).

Anomaly Detection in significant movements: The folder includes notebooks on finding anomalies in timeseries data (the next step from clustering movement data). The first couple of notebooks are exploratory, but there is modelling too. The most successful model is using a moving average. Other methods considered have not been as insightful.

Here are some important considerations:

Abnormal data is needed for testing purposes
There may be virtue in cutting out acceleration data points > 0.7 as they seem anomalous. However, there is no need to be hasty and rush to this, as a bit more analysis (looking into second dataset, as well as how gyration impacts this) is important
The data collection has not provided us with the insight that we needed to understand the movements (may be an error with the way data is being collected on the Arduino, but there is little evidence to support this claim)
Important aspects and methods have been documented, these can be found within anomaly_detection.py
future considerations can be found under the relevant Summary.ipynb file (or it's PDF version)

Flow Detection: not part of the problem description from above. Brief: you have access to simulation data that shows water flow inside a dwelling as a function of time. Can you determine anomalies in this data? (underflow, leak or open tap). Much of my work here is exploratory as I was in fact providing insight into another team member's problems.

Python_libs: includes .py files of relevant libraries / guides. Includes sub-directories for the clutering and anomaly_detection problems. These include all the Jupytert notebooks as .py files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anomaly Detection in Significant Movements

Anomaly Detection in Significant Movements

Clustering movement data

Clustering movement data

Data

Data

Flow Detection

Flow Detection

Images

Images

Models

Models

Python_files

Python_files

Research

Research

.gitignore

.gitignore

README.md

README.md

reproduce_project.pages

reproduce_project.pages

reproduce_project.pdf

reproduce_project.pdf

requirements.txt

requirements.txt

Repository files navigation

Kin-Keepers

Problem Description:

Contents of repository:

Files:

Directories:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
Anomaly Detection in Significant Movements		Anomaly Detection in Significant Movements
Clustering movement data		Clustering movement data
Data		Data
Flow Detection		Flow Detection
Images		Images
Models		Models
Python_files		Python_files
Research		Research
.gitignore		.gitignore
README.md		README.md
reproduce_project.pages		reproduce_project.pages
reproduce_project.pdf		reproduce_project.pdf
requirements.txt		requirements.txt

namiyousef/Kin-Keepers

Folders and files

Latest commit

History

Repository files navigation

Kin-Keepers

Problem Description:

Contents of repository:

Files:

Directories:

About

Resources

Stars

Watchers

Forks

Languages