Skip to content

Commit 57cbae7

Browse files
committed
Add readme
1 parent ea3a350 commit 57cbae7

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

Diff for: issue_17/README.md

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Issue #17 Extract 'Overdue Inspections' Feature from Restaurant Inspection Data
2+
3+
This script summarizes the restaurant inspections data by year, week, census block,
4+
establishment type, and risk category, and creates two features: number of overdue inspections and average number of days since last inspection (inspection frequency). The second output is not part of the original issue and is added as an alternative feature to test.
5+
6+
## Instructions
7+
8+
Update directory paths in section labeled "UPDATE PATHS" before running script
9+
10+
## Inputs
11+
12+
Cleaned restaurant inspections and geocoding files:
13+
* dc_restaurant_inspections/potential_inspection_summary_data.csv
14+
* dc_restaurant_inspections/restaurant_inspections_geocoded.csv
15+
16+
## Outputs
17+
18+
CSV files with features:
19+
* restaurant_inspections_overdue.csv
20+
* restaurant_inspections_frequency.csv
21+
22+
Quick visualisations by week by risk category:
23+
* restaurant_inspections_overdue.png
24+
* restaurant_inspections_frequency.png
25+
26+
## Open issues
27+
28+
* Update inspection frequency requirements with official DOH values
29+
* Decide what to do about the 'bias' towards fewer overdue inspections & more frequent inspections in earlier time periods. In the earlier time periods, most establishment would have an "NA" for days since last inspection, because their last inspection is not in the data set; only those that have had inspections very recently (i.e. since start of data set) would have a value and be included in the aggregate calculations.

0 commit comments

Comments
 (0)