FUSEARCH

A Python3, console based full-text search for document collections. It converts different types of documents such as PDF, word files etc to text and creates a simple inverted index for queries.

The index is kept in a sqlite file in the indexed directory.

This software is ALPHA status

How to run

Recommend to create and activate a venv

virtualenv -p(which python3) venv
source venv/bin/activate

Edit fusearch.yaml and add some directory to index.

Start the daemon in foreground mode (-f) and see the indexing process take place.

pip install -e .
fusearchd.py -f -c fusearch.yaml

Dependencies

From textract:

apt-get install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr
flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig

Name	Name	Last commit message	Last commit date
Latest commit larroy Badges May 23, 2020 6922cb9 · May 23, 2020 History 33 Commits
.github/workflows	.github/workflows	GH workflow for CI	May 23, 2020
3rdparty	3rdparty	TFIDF indexing	Dec 21, 2018
bin	bin	Refactor	May 23, 2020
docs	docs	inital commit, project structure	Dec 9, 2018
src/fusearch	src/fusearch	Fix CI	May 23, 2020
tests	tests	Refactor	May 23, 2020
.gitignore	.gitignore	Refactor	May 23, 2020
.gitmodules	.gitmodules	Work on indexing	Dec 19, 2018
.pre-commit-config.yaml	.pre-commit-config.yaml	Refactor	May 23, 2020
LICENSE	LICENSE	LICENSE	May 23, 2020
README.md	README.md	Badges	May 23, 2020
TODO	TODO	Refactor	May 23, 2020
fusearch.yml	fusearch.yml	Refactor	May 23, 2020
requirements.txt	requirements.txt	Fix initialization and requirements	Dec 30, 2018
setup.py	setup.py	Fix CI	May 23, 2020
test-requirements.txt	test-requirements.txt	Daemonization	Dec 9, 2018
test.py	test.py	Refactor	May 23, 2020
test.sh	test.sh	test.sh	Jan 9, 2019
tox.ini	tox.ini	Daemonization	Dec 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FUSEARCH

How to run

Dependencies

About

Releases

Packages

Languages

License

larroy/fusearch

Folders and files

Latest commit

History

Repository files navigation

FUSEARCH

How to run

Dependencies

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages