GitHub - Kandy16/india-school-analysis

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
csv-json		csv-json
india-age-demography		india-age-demography
leaflet-india-states		leaflet-india-states
new-data-set		new-data-set
new-facilities		new-facilities
old-data-set		old-data-set
old-facilities		old-facilities
#full-time-teachers-initial.csv		#full-time-teachers-initial.csv
#govt-schools-initial.csv		#govt-schools-initial.csv
#pucca-building-type-schools.csv		#pucca-building-type-schools.csv
#schools-initial.csv		#schools-initial.csv
#schools.csv		#schools.csv
#students-initial.csv		#students-initial.csv
#students.csv		#students.csv
#teachers-initial.csv		#teachers-initial.csv
#teachers.csv		#teachers.csv
1-#teachers-analysis.ipynb		1-#teachers-analysis.ipynb
2-#school-analysis.ipynb		2-#school-analysis.ipynb
3-#students-analysis.ipynb		3-#students-analysis.ipynb
4-student-pass-percentage.ipynb		4-student-pass-percentage.ipynb
5-full-time-teachers-percentage.ipynb		5-full-time-teachers-percentage.ipynb
6-govt-private-schools-analysis.ipynb		6-govt-private-schools-analysis.ipynb
7-pucca-building-school-analysis.ipynb		7-pucca-building-school-analysis.ipynb
README		README
boundary-wall-schools-percentage.csv		boundary-wall-schools-percentage.csv
computer-availability-schools-percentage.csv		computer-availability-schools-percentage.csv
consolidate-all-data.ipynb		consolidate-all-data.ipynb
data-cleaning-new-data-set.ipynb		data-cleaning-new-data-set.ipynb
dataStory.docx		dataStory.docx
dataStory.odt		dataStory.odt
developer-notes.txt		developer-notes.txt
drinking-water-schools-percentage.csv		drinking-water-schools-percentage.csv
electricitiy-connection-schools-percentage.csv		electricitiy-connection-schools-percentage.csv
extract-for-color-range.ipynb		extract-for-color-range.ipynb
final-color-range.json		final-color-range.json
final-data.csv		final-data.csv
full-time-teachers-percentage.csv		full-time-teachers-percentage.csv
govt-schools-percentage.csv		govt-schools-percentage.csv
hand-wash-facility-percentage.csv		hand-wash-facility-percentage.csv
literacy-rate.csv		literacy-rate.csv
mid-day-meal-school-percentage.csv		mid-day-meal-school-percentage.csv
play-ground-school-percentage.csv		play-ground-school-percentage.csv
private-schools-percentage.csv		private-schools-percentage.csv
pucca-building-type-schools-percentage.csv		pucca-building-type-schools-percentage.csv
school-drop-out-rate.csv		school-drop-out-rate.csv
students-pass-percentage.csv		students-pass-percentage.csv
text-books-received-schools-percentage.csv		text-books-received-schools-percentage.csv
toilet-facility-schools-percentage.csv		toilet-facility-schools-percentage.csv
trained-teachers-percentage.csv		trained-teachers-percentage.csv

Repository files navigation

This file will give you information regarding what programs does what in the repository and what is the sequence of actions followed to prepare the final data

1) 'Old-data-set' and 'Old-facilities' are the folders which contain the data organized in some hierarchy. Each file is downloaded from data.gov.in portal. A copy of those two folders are made with name changed as 'new-data-set' and 'new-facilities'

2) Data cleaning and preparation step. Bringing standardisation in names used to represent the state names.

'data-cleaning-new-data-set.ipynb' notebook does the whole job and updates all the files in the 'new-data-set' and 'new-facilities' folder

If you want to run it and get a feel of how the changes have happened then delete the new-data-set folder and copy and rename old-data-set and run the program in jupyter notebook and see the step by the step process.

3) We selected 20 attributes by looking the data set and each information is present in a single file.
Some attributes require normalisation against the expected number of students and so it is calculated inside the folder 'india-age-demography'. We considered the people who are of age between 3 and 20 and consider them to be expected number of students.

Out of the 20 attributes, we have # of shools, students and teachers, which will be high in number for states with large populations. We normalize their counts for 100 expected students in each state. We plan to increase the count 100 to improve the representation of information, if required.

Individual custom programs (python notebook files) have been written (if required) to adapt the information as required and stored separately in the same home directory.

Say for e.g '1-#teachers-analysis.ipynb' program takes the teachers data from 'new-data-set' folder and uses the calculated expected number of students from 'india-age-demography' and produces a normalized info and store it in '#teacher.csv'

Sometimes, the required data is copied by hand and stored in file in parallel to other data files.

4) 'consolidate-all-data.ipynb' access each file and consolidates into whole data and create 'final-data.csv'

5) 'csv-json' combines the visual representation of states (state map co-ordinates) and the 'final-data' into json format which is easy to program using Web Front end technologies