Skip to content

This is the Look-A-Like Model Project. This supports only python

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
Apache-2.0
LICENSE.apache
AGPL-3.0
LICENSE.gpl
Notifications You must be signed in to change notification settings

ed-turner/look-a-like

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Welcome to the Look-A-Like Python Package

Author: Edward Turner

Introduction

Generally, we want to be able to predict various characteristics, perhaps simultaneously, ensuring that the samples in the testing dateset that "looks like" the samples in the training dataset have similar predictive values. There are various methods that exist today that are predictive in nature, and are well documented. However, there are few that is able to ensure that samples from the testing dataset with similar features as in the testing dataset have similar predictive values.

This python package delivers a highly sought-after methodology, which utilizes the relative importance each feature has to be predictive to our chosen value and scales our features accordingly their importance, then perform a nearest neighbors algorithm to generate our matches.

A more full description of the methodology is found under the Methodology section.

Installation

One method of installing the python package, whether in a virtual environment or your own local machine, is to git clone the repo, change the directory to the python-package directory, and run python setup.py install.

Methodology

As mentioned in the introduction, we derive some values that are based on the predictive power of each feature and scale those features by those values. To do that, we use the Light Gradient Boosting Method (LGBM) to fit the training dataset. To optimize the LGBM using bayesian hyper parameter optimize on a train/validation split on the original training dataset. Once optimized, we fit on the entire training dataset. By doing so, we will generate the feature importance for each feature. We then scale our feature importance so that they are nonzero and sum to one. This is the very first step.

Once we derive our feature importance, we scale our features according to their feature importance, after standardizing our features. There are several available distance measures to use for our matching algorithm, along with different ways to find our closest neighbors. For our distance calculation, we have the p-norm measure, the mahalanobis measure, and the cosine measure. For our nearest-neighbors algorithm, we have the k-nearest-neighbors algorithm and the hungarian-matching algorithm. This gives us a total of 6 types of matching algorithms.

Tutorial

To use this model, simply follow this short example

from lal import LALGBRegressor

# to use the linear sum assigment for matches,
# pass linear_sum to k;
# and use the cosine measure, 
# pass cosine to the p value
model_params = {
                "k": "linear_sum", 
                "p": "cosine"
                }

model = LALGBRegressor(**model_params)

model.fit(train_data, train_labels)

test_labels = model.predict(
                            train_data, 
                            train_labels, 
                            test_data
                            )

As a note, it is suggested that all missing values are taken cared of before using the model.

Documentation

For code documentations, please go here

Or have a look at the code repository.

License

This work is dual-licensed under Apache 2.0 and GPL 2.0 (or any later version). You can choose between one of them if you use this work.

SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later

About

This is the Look-A-Like Model Project. This supports only python

Topics

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
Apache-2.0
LICENSE.apache
AGPL-3.0
LICENSE.gpl

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages