AI apps/benchmark for legaltech
DEMO with huggingface, Jina and European Court of Justice judgmentsThis site contains 113 judgments of the European Court of Justice
from 2019 and 2020 concerning tax issues.
All sentences from judgments have been encoded via BERT model
(bert-base-uncased
provided by
Huggingface's
transformers
library), an example
of a very powerful NLP model that has conquered AI applications.
The infrastructure of the search experience is based on
Jina - a wonderful scalable library to design neural search engines,
based on the newest Deep Learning strategies.
The entire concept - as well as Jina and Huggingface - has a great future in legal tech, because lawyers
need to use a lot of documents, and searching among them is highly challenging...
How does it work?
- Write a phrase / sentence
- Click Enter
- You get the most similar sentence (the lower the score, the better)
Enjoy!...
... and be aware that this is a playground. Sometimes BERT doesn't give proper hits,
but sometimes analogies are pretty impressive, like:
QUERY: that complaint was rejected
RESULTS:
- That request was rejected.
- Its application was rejected, as was the objection that it subsequently lodged.
- That request was rejected.
- That is unfair and unlawful.
- That argument cannot be accepted.
I am aiming to test other approaches (like other transformer architectures), and fine-tune it, in order to prepare an ultimate benchmark of AI solutions for legaltech, so stay tuned and follow me at LinkedIn, Twitter and on my blog at inteliLex.
If you would like to test it on more documents and play with the code,
please clone this
git repository and contribute to it.
Feel free to contact me: artur.tanona@gmail.com.
Below please find how to launch it on Ubuntu. If you are working on (for example) Windows 10 you need to have a Docker engine running and you can skip to the last part "Run on Docker"
Upload documents in *.txt
format to search_engine/data
and frontendApp/src/assets
.
export JINA_PORT=56798
export JINA_PARALLEL=1
export JINA_SHARDS=1
export CLIENT_PORT=80
export JINA_WORKSPACE=test_index
export JINA_MAX_DOCS=100
export JINA_PORT=65481
And put them in the .env/.env
file:
JINA_PORT=56798
JINA_PARALLEL=1
JINA_SHARDS=1
CLIENT_PORT=80
JINA_WORKSPACE=test_index
JINA_MAX_DOCS=100
JINA_PORT=65481
Create virtual environment and source it. Then, in the search_engine
directory:
cd search_engine
pip install -r requirements.txt
python app.py -t index
cd ..
sudo docker-compose up
(If you are in Google Cloud, please, update environment.prod.ts file with your host and add firewall rules)
And you can open the website on http://localhost:4200
Demo does not include PolTaxBERT and StateAid Bert - they will be included soon.