Type4Py: Deep Similarity Learning-Based Type Inference for Python

This repository contains the implementation of Type4Py and instructions for re-producing the results of the paper.

Dataset

Type4Py dataset can be downloaded from here. It contains around 4,910 Python projects from GitHub, which were cloned in October 2019.

Code de-duplication

Same as the paper, it is essential to de-duplicate the dataset for avoiding duplication bias when training and testing the model. Check out the CD4Py tool for code de-duplication.

Installation Guide

Requirements

Linux-based OS
Python 3.5 or newer
An NVIDIA GPU with CUDA support

Quick Install

git clone https://github.com/saltudelft/type4py.git && cd type4py
pip install .

Usage Guide

Follow the below steps to train and evaluate the Type4Py model.

1. Extraction

$ type4py extract --c $DATA_PATH --o $OUTPUT_DIR --d $DUP_FILES --w $CORES

Description:

$DATA_PATH: The path to the Python corpus or dataset.
$OUTPUT_DIR: The path to store processed projects.
$DUP_FILES: The path to the duplicate files. [Optional]
$CORES: Number of CPU cores to use for processing projects.

2. Preprocessing

$ type4py preprocess --o $OUTPUT_DIR

Description:

$OUTPUT_DIR: The path that was used in the first step to store processed projects.

3. Vectorizing

$ type4py vectorize --o $OUTPUT_DIR

Description:

$OUTPUT_DIR: The path that was used in the first step to store processed projects.

4. Learning

$ type4py learn --o $OUTPUT_DIR --c --p $PARAM_FILE

Description:

$OUTPUT_DIR: The path that was used in the first step to store processed projects.
--c: Trains the model for the combined prediction task. Use --a and --r for argument and return type prediction tasks, respectively.
--p $PARAM_FILE: The path to user-provided hyper-parameters for the model. See this file as an example. [Optional]

5. Testing

$ type4py predict --o $OUTPUT_DIR --c

Description:

$OUTPUT_DIR: The path that was used in the first step to store processed projects.
--c: Tests the model for the combined prediction task. Use --a and --r for argument and return type prediction tasks, respectively. Note that this argument should be the same as the one that was used in the learning step.

6. Evaluating

$ type4py eval --o $OUTPUT_DIR --c --tp 10

Description:

$OUTPUT_DIR: The path that was used in the first step to store processed projects.
--c: Evaluates the model for the combined prediction task. Use --a and --r for argument and return type prediction tasks, respectively. Note that this argument should be the same as the one that was used in the learning step.
--tp 10: Considers Top-10 predictions for evaluation. For this argument, You can choose a positive integer between 1 and 10. [Optional]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
type4py		type4py
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

type4py

type4py

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Type4Py: Deep Similarity Learning-Based Type Inference for Python

Dataset

Code de-duplication

Installation Guide

Requirements

Quick Install

Usage Guide

1. Extraction

2. Preprocessing

3. Vectorizing

4. Learning

5. Testing

6. Evaluating

About

Releases

Packages

Languages

License

chubbymaggie/type4py

Folders and files

Latest commit

History

Repository files navigation

Type4Py: Deep Similarity Learning-Based Type Inference for Python

Dataset

Code de-duplication

Installation Guide

Requirements

Quick Install

Usage Guide

1. Extraction

2. Preprocessing

3. Vectorizing

4. Learning

5. Testing

6. Evaluating

About

Resources

License

Stars

Watchers

Forks

Languages