This is a docker image that provides a web application to produce training material for ML-based reference extraction & segmentation engines. Currently supported are
- AnyStyle (Annotation & reference extraction)
- EXParser (Only editing of existing annotations; EXparser reference extraction was supported in v1.0.0)
The image provides a Web UI for producing training material which is needed to improve citation recognition for particular corpora of scholarly literature where the current models do not perform well.
A demo of the web frontend (without backend functionality) is available here.
- Install Docker
- Clone this repo with:
git clone https://github.com/cboulanger/excite-docker.git && cd excite-docker
- Build docker image:
./bin/build
- If you want to use AnyCite, please consult its GitHub page on how to install it: https://github.com/inukshuk/anystyle
- Run server:
./bin/start-servers
- Open frontend at http://127.0.0.1:8000/web/index.html
- Click on "Help" for instructions (also lets you download the Zotero add-ons)
You can connect the app to a local Zotero client to upload extracted references. This feature requires the installation of the following add-ons:
The webapp will then enable additional commands that let you retrieve the PDF attachment(s) of the currently selected item/collection, extract references from them and store them with the citing item.
If the Zotero storage folder is not located in ~/Zotero/storage
, you need to
rename .env.dist
to .env
and in this file, set the ZOTERO_STORAGE_PATH
environment variable to the path pointing to this directory.