a full text search engine built in nodejs using inverted index as search data structure.
NOTE: the above implementation is not production ready as of now, it just implements an in-memory data storage for performing full text search, coupled with an express server for external usage.
the search fuctionality is implemented using the following pipeline
before storing the data, it's cleaned and processed using following filters:
- tokenization
- lowercase filter
- stopwords filter
- punctuations filter
- stemming filter[TODO]
for detail implementation check search.ts.
once the tokens are processed they are added to an inverted index data structure, which basically maps the tokens to a list of documents in which they are present. Currently the server only maintains one index for searching purpose and uses intersection between search query tokens to return matched documents.
[TODO]: implement an inverted index with weighted ranks to support union of search query tokens.
for detail implementation of inverted index check post.ts.
the documents can be retrieved sorted according to multiple fields at high speeds, as dedicated index's are maintained and updated for the same.
requirements
- docker
- docker-compose
start the server using following command
docker-compose -f ./deployment/docker-compose.yaml up -d
# get all posts at page = 0
# posts ordered by dateLastEdited
localhost:3000/api/post
# get all posts at page = 2
# posts ordered by dateLastEdited
localhost:3000/api/post?page=2
# get all posts at page = 2
# posts ordered by dateLastEdited in descending order
localhost:3000/api/post?asc=false&page=2
# get all posts at page = 2 and limit 15
# posts ordered by dateLastEdited
localhost:3000/api/post?limit=15&page=2
# get all posts at page = 0
# posts ordered by name
localhost:3000/api/post?sortBy=name
# get all posts at page = 2
# posts ordered by name
localhost:3000/api/post?sortBy=name&page=2
# get all posts at page = 2
# posts ordered by name in descending order
localhost:3000/api/post?asc=false&sortBy=name&page=2
# get all posts at page = 2 and limit 15
# posts ordered by name
localhost:3000/api/post?limit=15&sortBy=name&page=2
# search query in name
localhost:3000/api/post?searchIn=name&query=customer
# search query in name with pagination
localhost:3000/api/post?searchIn=name&page=1&query=human
# search with exact query
localhost:3000/api/post?searchIn=name&query="Human Communications Representative"
# search in description with pagination
localhost:3000/api/post?limit=15&searchIn=description&page=1&query=vel
# exact search in description
localhost:3000/api/post?searchIn=description&query="Explicabo quae rerum dolorum nostrum aut"
all logs generated by server are extracted by fluentd and dumped here.
Akshit Sadana [email protected]
- Github: @Akshit8
- LinkedIn: @akshitsadana