Creating a deep-learning
, neural network
to analyze and classify the success of charitable donations
The purpose of this analysis is to build a deep learning neural network with at least 75% predictive accuracy of determining the success or failure of charitable donations to non-profit companies. This neural network is built to help a foundation make future decisions on which companies should receive charitable grants, based on historical data. The dataset provided contained information from 34,000 organizations and captured metadata, both categorical and continous features, that could potentially be used to build a model with high preditive accuracy.
Features included:
- EIN and NAME—Identification columns
- APPLICATION_TYPE—Alphabet Soup application type
- AFFILIATION—Affiliated sector of industry
- CLASSIFICATION—Government organization classification
- USE_CASE—Use case for funding
- ORGANIZATION—Organization type
- STATUS—Active status
- INCOME_AMT—Income classification
- SPECIAL_CONSIDERATIONS—Special consideration for application
- ASK_AMT—Funding amount requested
Target variable:
- IS_SUCCESSFUL—Was the money used effectively
- Data Source:
charity_data.csv
- Data Tools:
AlphabetSoupCharity.ipynb
,AlphabetSoupCharity_Optimization
- Software:
imbalanced-learn
,skikit-learn
,TensorFlow
Jupyter Notebook
andPython.9.2.3
- Deliverable 1: Preprocessing Data for a Neural Network Model
- Deliverable 2: Compile, Train, and Evaluate the Model
- Deliverable 3: Optimize the Model
- Deliverable 4: A Written Report on the Neural Network Model (README.md)
Data Preprocessing
-
Target variables:
- "IS_SUCCESSFUL"
-
Feature variables: -"APLICATION_TYPE"
- "CLASSIFICATION"
- "USE_CASE"
- "ORGANIZARION"
- "STATUS"
- "INCOME_AMT"
- "SPECIAL_CONSIDERATIONS"
- "ASK_AMT"
-
Removed/droped variables:
- 'EIN'
- 'NAME'
-
Compiling, Training and Evaluating the Model Best performance characteristics:
- Number of hidden layers: 2
- Number of neurons per layer: 100 (hidden nodes layer1), 40 (hidden nodes layer2)
- Activation functions used: Relu, Relu, Sigmoid (Output layer)
Attemps Taken to Improve Performance:
- Attempt 1:
- Attempt 2:
Overall, due to possible overfitting of features and reducing variance and rare occurances of variables, my model was not able to acheive the 75% predictive accuracy threshold requested by the challenge prompt. Through the process of attempting to optimize the model, I saw first hand that sometimes it is prefferable to keep the model less complicated with fewer hidden layers, epochs and neurons--the complexity of the deep learning neural network should appropriately match that of the input data, thus corroborating the notion that a deep learning model only needs a couple hidden layers to acheive high predictive accuracy on both training and testing data. For further analysis, I would use a SVM model to solve the classification problem (effective charity donations). Using SVMs are beneficial because they specialize in identifying a singular output or target variable, which is what this dataset calls for. SVMs also adequately build models with linear or non-linear data, a benefit when working with categorical metadata.