To facilitate the transition of useful information to a newer workforce, MassDEP is looking for an innovative knowledge transfer mechanism that can inform newer employees of the relevant citations (an alphanumeric categorization MassDEP uses to identify a specific violation) when faced with a circumstance (the situation which resulted in the code violation being identified) by providing similar enforcement documents from the past.
The ultimate goal of this project is to develop a web application to automate the knowledge transfer. Smaller objectives include:
- Extracting circumstances and citations from unstructured documents.
- Performing analysis on document polarity and subjectivity.
- Enabling full-text search and similarity search on circumstances.
- Building a user-friendly web application.
File Conversion
- Make sure pdf_to_txt.py and batch_pdf_to_txt.py are in the same directory with the folder containing pdf files.
- Run batch_pdf_to_txt.py for folder to folder processing.
Parsing
- 1_dataprocess_elements.ipynb
- file_parser.py
- parser_alex.py
Cleaning
- deep_clean.py
Exploratory Data Analysis
- 1basic_statistics.ipynb
- 2sentimental_analysis.ipynb
-
Elasticsearch installation. https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html
MacOS: We recommend install Elasticsearch with the Homebrew package manager.
i) Run the following code from the command line.
brew tap elastic/tap brew install elastic/tap/elasticsearch-full
ii) Run the following code from the command line to change Elasticsearch configuration.
cd /usr/local/etc/elasticsearch open elasticsearch.yml
iii) Paste the following code to the end of the yml file.
http.cors.enabled : true http.cors.allow-origin : "*" http.cors.allow-methods : OPTIONS, HEAD, GET, POST, PUT, DELETE http.cors.allow-headers : X-Requested-With, X-Auth-Token,Content-Type, Content-Length
-
Elasticsearch-browser installation. Elasticsearch-browser is needed for the front-end. MacOS:
npm install elasticsearch-browser
-
Install the Elasticsearch Python client. Run the following code from the command line. If you have more than one python version, make sure the package install in the version you used in Pycharm.
pip install elasticsearch
-
Install npm and Node.js. https://docs.npmjs.com/downloading-and-installing-node-js-and-npm
-
Install the Angular CLI
npm install -g @angular/cli
-
Download the 19fall-GQP repository.
-
Start Elasticsearch. Run
elasticsearch
from the command line. -
CSV files are included in the
/data
folder. To import data into Elasticsearch, first make sure Elasticsearch is connected, then run/src/es-load.py
Once you run the code successfully, you will see the pics below.
-
Install dependencies. Go to
/web
and runnpm install
It is very common to see warnings and errors during step 4. We include some examples in the troubleshooting section.
-
To start the web app, under
/web
runng serve
The compilation may take a while, if it is successful, you will see:
It is also very common to see warnings and errors during step 5. We include some examples in the troubleshooting section.
-
Stop Elasticsearch and the Web App. Press
Control
+C
in both command line windows.
Fix: Run the following code from the command line.
sudo npm install -g @angular/cli@latest
Then run ng serve
in the command line.
Fix: Open the package.json
file under /web and
change "@angular/compiler-cli" version as shown in the below screenshot.
Then run npm install
and ng serve
in the command line
Fix: open the package.json
file and
change rxjs
and TypeScript
version like the below screenshot
Next, go to the project folder and delete the node_modules
folder.
After the deletion, run npm install
and ng serve
in the command line
This project was generated with Angular CLI version 6.0.3.
- Alex (ilovemanu)
- Ada (ZhiyiHuanghzy)
- Achu (ekshej)
- Henry (henryji96)