The ESPRESSO project researches, develops, and evaluates decentralised algorithms, meta-information data structures, and indexing techniques to enable large-scale data search across personal online datastores, taking into account varying access rights and caching requirements.
The ESPRESSO system (Figure below) contains the following components that are installed alongside each Solid server in the network:
-
The indexing app (Brewmaster), indexes the pods and creates and maintains the pod indices, along with the meta-index for the server (see below).
-
The search app (CoffeeFilter), performs the local search on the server.
-
The overlay network (the prototype system uses a custom build of GaianDB) connects the servers, and routes and propagates the queries.
-
The user interface app (Barista) receives queries from the user and presents the search results.
At this stage, ESPRESSO has the following limitations:
- It covers only keyword-based searches. Enabling structured queries is on the plan.
- To enable top-k search in ESPRESSO, decentralized ranking algorithms must be developed.
- Data Splitter: Having a textual dataset composed of a single file (or more), the dataset splitter chunks the dataset into a specified number of files of a required size.
cd /Automation/DatasetSplitter python3 main.py
- Infrastructure and Deployment: (Solid Servers and Pods creation per each Solid server)
- Solid Servers:
cd /Automation/CSS_Automation/ ansible-playbook -i inventory-50VM.ini solidservers.yaml --ask-become-pass
Note that, The inventory-50VM.ini
includes a list of the VM IPs with ssh
username and password credentials.
- Pods creation: (To Be de-tangled from the experiments setup.)
- (Keyword-based) Indexing:
-
Housing Pods with the generated files
-
Generating indexes at each Pod.
-
Acl Specifications for Files and Indexes.
-
Validating Access to Files and Indexes
Penny or Postman
orcurl http:...
. -
Automating Cloning GaianDB and Search App on Servers.
You could do that by clonning and updating (pulling) the ESPRESSO repo on the machines automatically using the following ansible playbook.
cd /Automation/CSS_Automation/ ansible-playbook -i inventory-50VM.ini espressorepos.yaml --ask-become-pass
- Hints about Source Code and Build of GaianDB
In the GaianDB
directory, you can find two sub-directorories:
(1) GaianDB_Keyword_Search_SourceCode
for the source code of our GaianDB inclusing the solid-gaian
connector.
(2) GaianDB_Keyword_Search_Build
is the build version of the source code, including all necessary jar
files for running the service.
You could clone this source code (in GaianDB_Keyword_Search_SourceCode
) for further developments. Make sure to build your developements using any build tool such as (maven
or ant
). Please make sure that you add the generated build jars to the lib
directory of the GaianDB_Keyword_Search_Build
directory.
- Starting GaianDb Service on Servers:
cd /Automation/CSS_Automation/ ansible-playbook -i inventory-50VM.ini startGaianServers.yaml --ask-become-pass
- Running Search App with Parameters on a Specified number of Solid servers.
cd /Automation/CSS_Automation/ ansible-playbook -i inventory-50VM.ini startSearcherServers.yaml --ask-become-pass
- Stpooing GaianDB service and Search App on Serevrs:
Stop GaianDB service:
cd /Automation/CSS_Automation/ ansible-playbook -i inventory-50VM.ini stopGaianServers.yaml --ask-become-pass
Stop Search App:
cd /Automation/CSS_Automation/ ansible-playbook -i inventory-50VM.ini stopSearcherServers.yaml --ask-become-pass
There is a unified experiment setup called flexexperiment
.
By default creates pods, populates them with files, crawls the pods, indexes them, and uploads the indices through the Solid interface.
To create an experiment
cd /Automation/ExperimentSetup python3 flexexperiment.py podname firstserver lastserver sourcedir expsavedir numberofpods numberoffiles
Where
podname
is the common part name of the pod names to be created. On each server there will be pods podname0, podname1,
etc.
firstserver
the number of the first server used.
lastserver
the number of the last server used plus 1. The servers are from a hardcoded list of servers currently.
sourcedir
local path to the local directory containing all the files for the experiment.
expsavedir
path to the local directory where the experiment will be stored.
numberofpods
total number of pods in the experiment.
numberoffiles
total number of files in the experiment
There are other variants that do the experiment slightly differently: onr creates pods, populates them with files, crawls the pods, creates and stores all the pod indices locally, and then uploads them. There is also capability for each pod to create a zip file that contains all the files in the pod and another zip file with the index, and upload the zip files to the servers via ssh, where they need to be unzipped to pods.
WISE 2023 Mohamed Ragab, Yury Savateev, Reza Moosaei, Thanassis Tiropanis, Alexandra Poulovassilis, Adriane Chapman, and George Roussos. 2023. ESPRESSO: A Framework for Empowering Search on Decentralized Web. In Web Information Systems Engineering – WISE 2023: 24th International Conference, Melbourne, VIC, Australia, October 25–27, 2023, Proceedings. Springer-Verlag, Berlin, Heidelberg, 360–375.
INWES 2021 Tiropanis, Thanassis, Poulovassilis, Alexandra, Chapman, Age, and Roussos, George (2021) Search in a Redecentralised Web. In Computer Science Conference Proceedings: 12th International Conference on Internet Engineering & Web Services (InWeS 2021).
- Mohamed Ragab, University of southampton, [email protected].
- Yury Savateev, University of Southampton, [email protected].
- Helen Olviver, University of Southampton, [email protected].
- Reza Moosaei, Queen Mary University of London, [email protected].
- Thanassis Tiropanis, University of Southampton, [email protected].
- Adriane Chapman, University of Southampton, [email protected].
- Alex Poulovassilis, Birkbeck, University of London, [email protected].
- George Roussos, Birkbeck, University of London, [email protected].
ESPRESSO is written and developed by the ESPRESSO project This code is released under the AGPL-3.0 license