Roosterize is a tool for suggesting lemma names in verification projects that use the Coq proof assistant. The tool is based on leveraging neural networks that take serialized Coq lemma statements and elaborated terms as input; see the Technique section below.
We strongly recommend installing the required versions of OCaml, Coq, and SerAPI via the OPAM package manager, version 2.0.6 or later.
To set up the OPAM-based OCaml environment, use:
opam switch create 4.07.1
opam switch 4.07.1
eval $(opam env)
Then, install Coq and SerAPI, pinning them to avoid unintended upgrades:
opam update
opam pin add coq 8.10.2
opam pin add coq-serapi 8.10.0+0.7.1
We recommend installing the required versions of Python and PyTorch using Conda.
Then, install PyTorch by following the instructions here, using the installation command suitable for your operating system, Python package manager, and whether you want to use it on a CPU or GPU.
Next, either download a Roosterize release or clone the Roosterize repository and enter the directory:
git clone https://github.com/EngineeringSoftware/roosterize.git
cd roosterize
To install other required Python libraries (pip
is included with
Python installation):
pip install -r requirements.txt
Roosterize's pre-trained models are available for separate download.
Put the archive with the models (models.tgz
) in the Roosterize root
directory, and then uncompress it:
tar xzf models.tgz
To use Roosterize on a Coq verification project, you need to first
build and install the project with the command provided by the project
(usually make
followed by make install
). Then, at the root directory of
the project repository, run the command
python -m roosterize.main suggest_lemmas \
--project=$PATH_TO_PROJECT \
--serapi_options=$SERAPI_OPTIONS \
--model-dir=./models/roosterize-ta \
--output=./output
where $PATH_TO_PROJECT
should be replaced with the path to the
project, and $SERAPI_OPTIONS
should be replaced with the SerAPI
command line options for mapping logical paths to directories (see SerAPI's
documentation).
For example, if the logical path (inside Coq) for the project is Verified
,
you should set SERAPI_OPTIONS="-R $PATH_TO_PROJECT,Verified"
.
The command extracts all lemmas from the project, uses Roosterize's
pre-trained model (at ./models/roosterize-ta
) to predict a lemma name
for each lemma, and finally prints the lemma name update suggestions,
i.e., the predicted lemma names that are different from to the existing ones.
Below is an example of printed suggestions:
>>>>> Suggestions:
infotheo/ecc_classic/bch.v: infotheo.ecc_classic.bch.BCH.BCH_PCM_altP1 -> inde_F2
infotheo/ecc_classic/bch.v: infotheo.ecc_classic.bch.BCH.BCH_PCM_altP2 -> inde_mul
infotheo/ecc_classic/bch.v: infotheo.ecc_classic.bch.BCH.PCM_altP -> F2_eq0
...
For other usages and command line interfaces of Roosterize, please check the help:
python -m roosterize.main help
Roosterize learns and suggests lemma names using neural networks that take serialized Coq lemma statements and elaborated terms as input. For example, the Coq lemma sentence
Lemma mg_eq_proof L1 L2 (N1 : mgClassifier L1) : L1 =i L2 -> nerode L2 N1.
is serialized into the following tokens (simplified):
(Sentence((IDENT Lemma)(IDENT mg_eq_proof)(IDENT L1)(IDENT L2)
(KEYWORD"(")(IDENT N1)(KEYWORD :)(IDENT mgClassifier)
(IDENT L1)(KEYWORD")")(KEYWORD :)(IDENT L1)(KEYWORD =i)(IDENT L2)
(KEYWORD ->)(IDENT nerode)(IDENT L2)(IDENT N1)(KEYWORD .)))
and the corresponding elaborated term (simplified):
(Prod (Name (Id char)) ... (Prod (Name (Id L1)) ...
(Prod (Name (Id L2)) ... (Prod (Name (Id N1)) ...
(Prod Anonymous (App (Ref (DirPath ((Id ssrbool) (Id ssr) (Id Coq))) (Id eq_mem)) ...
(Var (Id L1)) ... (Var (Id L2)))
(App (Ref (DirPath ((Id myhill_nerode) (Id RegLang))) (Id nerode)) ...
(Var (Id L2)) ... (Var (Id N1))))))))
The diagram below illustrates Roosterize's neural network architecture, as applied to this example:
Our research paper outlines the design of Roosterize,
and describes an evaluation on a corpus
of serialized Coq code derived from the Mathematical Components
family of projects. The training, validation, and testing sets of Coq files from the corpus
used in the evaluation are defined in the training
directory.
If you have used Roosterize in a research project, please cite the research paper in any related publication:
@inproceedings{NieETAL20Roosterize,
author = {Nie, Pengyu and Palmskog, Karl and Li, Junyi Jessy and Gligoric, Milos},
title = {Deep Generation of {Coq} Lemma Names Using Elaborated Terms},
booktitle = {International Joint Conference on Automated Reasoning},
pages = {To appear},
year = {2020},
}