The aqf-ai project answers a specific question based on a previously filled questionnaire(s).
- only .xlsx files for now, provide most common part(s) of sheet names to search within files in .env
- english only
- run
./scripts/setup.sh
to set up the project; - paste your xls files in
data/raw_filled_security_questionnaire
- create .env file within project root dir and add all the environmental variables (list and description below);
- run
python main.py --action indexdb --filled-questionnaire-files-path {files_path}
. If you followed this instruction then 'data/raw_filled_security_questionnaire' instead of {files_path}; - run
python main.py --action answer --question {your_question}
. Example below (ps your answer may vary); - repeat 5. as many times as you have questions :)
python main.py --action answer --question Is disciplinary or sanction policy exists for employees who have violated security policies?
Proposed answer: Adyen has a formal disciplinary and sanctions policy established for employees who have violated security policies and procedures.
Based on existing Adyen Security Questionnaire example, link.
OPENAI_API_KEY="sk-***" -- openai api key, openai platform link
OPENAI_LLM_MODEL="gpt-3.5-turbo" -- openai model that will be used
OPENAI_EMBEDDING_MODEL="text-embedding-ada-002" -- openai model for embeddings (keep embedding-ada-002, it is cheap)
COLLECTION_NAME="aqf-ai-vector-store" -- vector store, string
TARGET_SHEET_NAMES="uestion, Questionnaire, QAsheet" -- list of possible sheet names separated by commas which will be searched within .xlsx files. Use the most common part of the sheet names.
WIP (work in progress) notes, skip them.
Main pillars and other ideas:
- many work with unstructured data (xls, pdf, doc, etc)
- RAG different appraches (naive, cluster, parent, rerank approaches)
- Ensemble method for final LLM generation
- read the file and automatically fill it based on a previously filled questionnaire(s).
- Думав може відразу під'єднати фреймворк що був чат, плюси в тому що можуть підключатися до багатьох каналів спілкування (whatsapp, telegram, viber, etc, voice), можуть підключатися до БД, як реляційних так і векторних. Ось список хороших: microsoft botbuilder, botpress, rasa. Але то складна система буде, треба щось простіше для початку... Повертаюсь до ідеї рану зі скрипта, далі буде видно.