Skip to content

A repository that contains work done for two AI projects at Kin Keepers.

Notifications You must be signed in to change notification settings

namiyousef/Kin-Keepers

Repository files navigation

Kin-Keepers

A repository that contains work done for two data science projects at Kin Keepers. I've uploaded this with permission from my supervisor, Mr. Juan Jimenez.

Note: this project was migrated into this repository with some modification, please note that some of the content has been obfuscated on purpose.

Problem Description:

There currently exists an Arduino that constantly sends movement data (accelerometer and gyroscopic) to be written onto a CSV file. Much of this data is uninformative, since it just shows that the user is stationary. The goals of this project are as follows:

  • find thresholds that partition movement data into 'significant' and 'non-significant' (using machine learning techniques or otherwise)
  • use this 'filtered' data to find anomalies in time-series acceleration and gyration data (i.e. if the user has lower acceleration / gyration on average then one should trigger an alarm) using machine learning methods or otherwise

A vote was taken on what movements were 'significant' and which were non-significant. The results are shown below:

Significant data:

  • standing up and sitting down straight after
  • fetching remote control that is not at an arm's reach
  • lifting and lowering one's lower leg to stimulate blood circulation
  • walking
  • falling

Non-significant data:

  • crossing one's arms
  • crossing one's legs
  • switching seating positions
  • moving from a sitting position to a lying one

Note that the above were used as 'testing criteria' for the clustering model developed.

Note that the first problem (clustering) can be found within the sub-directory 'Clustering movement data'. The second problem on anomaly detection can be found within sub-dir 'Anomaly Detection in Significant Movements'. These sub-dir contain summary documents (called Summary.ipnyb) that summarise the outcomes of the project. The most signifiant lesson learnt is that the data collection was not great, due to internal blockers met by the team.

Contents of repository:

Files:

requirements.txt: includes the requirements for the project. Note that I have Jupyter notebooks installed, which by default install a number of different libraries. Not all of these are necessary, but to stay safe, it's best to install the items in the requirements.txt using

reproduce_project.pdf: a guide to reproducing the project. Note that this should be used in conjunction with both the Summary.ipynb notebooks (or their PDF versions)

reproduce_project.pages: same as the above but an apple.pages file

Directories:

Data: directory containing all the data used for either research, flow detection project or the movement data project. The data for this has been generated by another team member.

  1. Filtered_data: contains the total data generated after filtering thresholds have been applied
  2. FlowSimulator: includes data from the flow simulation.
  3. Ignacio: contains all the data generated by Ignacio (.csv). Note that the filename represents the date data was generated, and a string '_repet' is added for those with inconsistent headers.
  4. Research data: this is data that was used for research purposes prior to project start.
  5. Rohan: same as Ignacio sub-directory, except data generated by Rohan.
  6. TestData: this represents 'individual-movements' generated by Ignacio, used to test the algorithm finding the filter thresholds

Models: includes ML models saved as .pkl files

Research: includes work done while I had no access to data (i.e. preparation for when I did actually get the data). Note that some of the work is incomplete.

Movement Clustering movement data: This folder The first stage of the project involved actually clustering this data to separate 'significant' movements from stationary datapoints. The first couple of notebooks are exploratory, then there is modelling, verification and finally finding numerical thresholds to be incorporated on the arduino itself (so that there is no constant data transfer).

Anomaly Detection in significant movements: The folder includes notebooks on finding anomalies in timeseries data (the next step from clustering movement data). The first couple of notebooks are exploratory, but there is modelling too. The most successful model is using a moving average. Other methods considered have not been as insightful.

Here are some important considerations:

  1. Abnormal data is needed for testing purposes
  2. There may be virtue in cutting out acceleration data points > 0.7 as they seem anomalous. However, there is no need to be hasty and rush to this, as a bit more analysis (looking into second dataset, as well as how gyration impacts this) is important
  3. The data collection has not provided us with the insight that we needed to understand the movements (may be an error with the way data is being collected on the Arduino, but there is little evidence to support this claim)
  4. Important aspects and methods have been documented, these can be found within anomaly_detection.py
  5. future considerations can be found under the relevant Summary.ipynb file (or it's PDF version)

Flow Detection: not part of the problem description from above. Brief: you have access to simulation data that shows water flow inside a dwelling as a function of time. Can you determine anomalies in this data? (underflow, leak or open tap). Much of my work here is exploratory as I was in fact providing insight into another team member's problems.

Python_libs: includes .py files of relevant libraries / guides. Includes sub-directories for the clutering and anomaly_detection problems. These include all the Jupytert notebooks as .py files.

About

A repository that contains work done for two AI projects at Kin Keepers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published