Scripts for tidying and indexing documents for Reguleque, a search engine for public worker transparency records from the Chilean government.
To install the requirements with Poetry:
poetry install
Alternatively, install them with pip (may be outdated):
pip3 install -r requirements.txt
To upload to production, remove the example files from data/in
and add your own CSV files. (They can be in subfolders, as long as the extension is .csv
).
Then, run the script as follows:
python3 reguleque_search/load_documents.py
You'll be asked to enter the admin API key for the public endpoint (you can also use the environment variable TYPESENSE_API_KEY
for this).
By default, the script will not modify existing documents or the collection schema. To overwrite the default schema and drop all documents:
Warning: This will delete all the documents in the Typesense collection.
python3 reguleque_search/load_documents.py --drop
You should install Typesense.
Then start a Typesense instance with:
typesense-server --data-dir=data/out/ --api-key=<ADMIN-API-KEY>
Warning: The provided API key has full read/write access on collections. A search-only API key must be generated for use with the frontend in production.
And then load and index the revenue entries on the running instance using the samples provided in data/in
:
python3 reguleque_search/load_documents.py --protocol http --endpoint 127.0.0.1 --port 8108
The accompanying frontend can be found on agucova/regulf-neo.