Skip to content

This repos is our final project in BSc degree in Data Science The project is an attraction recommendation application for Dublin tourists. We won 3rd place for this project.

Notifications You must be signed in to change notification settings

almog-gueta/AttractMe-Dublin-

 
 

Repository files navigation

AttractMe

Attraction Recommendation Application

Inbal Croitoru & Almog Gueta

About The App

Given current source bus station, the app recommends on an attraction, based on attractions' TripAdvisor rating, and optional bus lines' delays.

presentataion pic
Figure 1. Aaatraction Recommendation Application

About The Data

  • We have used 230 million recoreds from bus sensors within Dublin, between July 2017 to September 2018, as a stream data.

  • we have used TripAdvisor attractions' ratings that we scrapped from their website.

  • We matched these attractions data with attractions from the open data Irland website: data.gov.ie/[link].

  • We also used bus stops data that contain the geo location for each bus stop in Dublin. This data was downloaded from Smart Dublin website: link.

About The Technology

Apache_Spark™ and Jupyter_Notebook as processing frameworks, and Elasticsearch as datawarehouse.

About The Requirements & Usage

  • Processing is accomplished by using Spark 2.4.5 (PySpark) and Python 3.7.5.

  • Please look at the 'Requirements.txt' file for required libraries.

  • Instructions below assumes that the code will run on Databricks.

  • Since we use Elasticsearch, in order to run the code you will need to write the following command on your VM cmd: 'sudo docker-compose up -d'

  • Code is written as if we read stream from Kafka server, look at the cell 'read stream data' in 'final_app.ipynb'.

Final Task

In order to use the app (as a user) please enter the dashboard: XXXXX/[AppDashboard]. + and follow the instructions.

In order to run the app code, please run the following files in the following order: 1. 'attractions_schema_matching_using_NLP_methods.ipynb' 2. 'create_all_static_data_dfs.ipynb' 3. 'final_app.ipynb' Notice: this file includes code to create and upload to Elasticsearch the Delay stream data. Please type your Elasticsearch host number in the imports cell.

In order to run the final app you are requested to choose one of the options at the top of the notebook: * For Stream Sources, enter your api in the "API" option. * For Batch Sources, enter your json path in the "Json path" option. * For a single source, choose one of the bus stops options presented in the "Source Bus Stop" option.

The input data must include the same df columns as described in the 'final_app.ipynb' in the cell 'dublin data schema'.

Warm up

In order to run the Warm Up part, please run all files in the warmup_task directory in the following order: 1. 'preprocess_n_save_external_data.ipynb' 2. 'train_lr_model_task_2.ipynb' 3. 'train_lr_task_3.ipynb' 4. 'warmup_final.ipynb'

About

This repos is our final project in BSc degree in Data Science The project is an attraction recommendation application for Dublin tourists. We won 3rd place for this project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%