This library is the code-base that accompanies our in-progress workshop paper. We demonstrate that the attention distributions of trained BERT models provide strong enough signal to be used as the input themselves to downstream shallow neural networks. This approach enables us to limit the amount of data we require to train classification models.
The data can be found here: https://drive.google.com/uc?id=1dfN-WvFMiAWuOXq1VJ_EnpTDGQruWuxm&export=download