Skip to content

Commit

Permalink
Fixed table of contents, references to numpy arrays.
Browse files Browse the repository at this point in the history
  • Loading branch information
AnotherSamWilson committed Jul 28, 2024
1 parent 95c4427 commit cea5b51
Show file tree
Hide file tree
Showing 3 changed files with 71 additions and 112 deletions.
93 changes: 37 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ with lightgbm. The R version of this package may be found
- Has efficient mean matching solutions.
- Can utilize GPU training
- **Flexible**
- Can impute pandas dataframes and numpy arrays
- Can impute pandas dataframes
- Handles categorical data automatically
- Fits into a sklearn pipeline
- User can customize every aspect of the imputation process
Expand All @@ -48,58 +48,37 @@ you can find

#### Table of Contents:

- [Package
Meta](https://github.com/AnotherSamWilson/miceforest#Package-Meta)
- [The
Basics](https://github.com/AnotherSamWilson/miceforest#The-Basics)
- [Basic
Examples](https://github.com/AnotherSamWilson/miceforest#Basic-Examples)
- [Customizing LightGBM
Parameters](https://github.com/AnotherSamWilson/miceforest#Customizing-LightGBM-Parameters)
- [Available Mean Match
Schemes](https://github.com/AnotherSamWilson/miceforest#Controlling-Tree-Growth)
- [Imputing New Data with Existing
Models](https://github.com/AnotherSamWilson/miceforest#Imputing-New-Data-with-Existing-Models)
- [Saving and Loading
Kernels](https://github.com/AnotherSamWilson/miceforest#Saving-and-Loading-Kernels)
- [Implementing sklearn
Pipelines](https://github.com/AnotherSamWilson/miceforest#Implementing-sklearn-Pipelines)
- [Advanced
Features](https://github.com/AnotherSamWilson/miceforest#Advanced-Features)
- [Customizing the Imputation
Process](https://github.com/AnotherSamWilson/miceforest#Customizing-the-Imputation-Process)
- [Building Models on Nonmissing
Data](https://github.com/AnotherSamWilson/miceforest#Building-Models-on-Nonmissing-Data)
- [Tuning
Parameters](https://github.com/AnotherSamWilson/miceforest#Tuning-Parameters)
- [On
Reproducibility](https://github.com/AnotherSamWilson/miceforest#On-Reproducibility)
- [How to Make the Process
Faster](https://github.com/AnotherSamWilson/miceforest#How-to-Make-the-Process-Faster)
- [Imputing Data In
Place](https://github.com/AnotherSamWilson/miceforest#Imputing-Data-In-Place)
- [Diagnostic
Plotting](https://github.com/AnotherSamWilson/miceforest#Diagnostic-Plotting)
- [Imputed
Distributions](https://github.com/AnotherSamWilson/miceforest#Distribution-of-Imputed-Values)
- [Correlation
Convergence](https://github.com/AnotherSamWilson/miceforest#Convergence-of-Correlation)
- [Variable
Importance](https://github.com/AnotherSamWilson/miceforest#Variable-Importance)
- [Mean
Convergence](https://github.com/AnotherSamWilson/miceforest#Variable-Importance)
- [Benchmarks](https://github.com/AnotherSamWilson/miceforest#Benchmarks)
- [Using the Imputed
Data](https://github.com/AnotherSamWilson/miceforest#Using-the-Imputed-Data)
- [The MICE
Algorithm](https://github.com/AnotherSamWilson/miceforest#The-MICE-Algorithm)
- [Introduction](https://github.com/AnotherSamWilson/miceforest#The-MICE-Algorithm)
- [Common Use
Cases](https://github.com/AnotherSamWilson/miceforest#Common-Use-Cases)
- [Predictive Mean
Matching](https://github.com/AnotherSamWilson/miceforest#Predictive-Mean-Matching)
- [Effects of Mean
Matching](https://github.com/AnotherSamWilson/miceforest#Effects-of-Mean-Matching)
This document contains a thorough walkthrough of the package,
benchmarks, and an introduction to multiple imputation. More information
on MICE can be found in Stef van Buuren’s excellent online book, which
you can find
[here](https://stefvanbuuren.name/fimd/ch-introduction.html).

#### Table of Contents:

- [Classes](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#classes)
- [Basic Usage](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#basic-usage)
- [Example](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#basic-usage)
- [Customizing LightGBM Parameters](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#customizing-lightgbm-parameters)
- [Available Mean Match Schemes](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#adjusting-the-mean-matching-scheme)
- [Imputing New Data with Existing Models](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#imputing-new-data-with-existing-models)
- [Saving and Loading Kernels](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#saving-and-loading-kernels)
- [Implementing sklearn Pipelines](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#saving-and-loading-kernels)
- [Advanced Features](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#advanced-features)
- [Building Models on Nonmissing Data](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#building-models-on-nonmissing-data)
- [Tuning Parameters](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#tuning-parameters)
- [On Reproducibility](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#on-reproducibility)
- [How to Make the Process Faster](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#how-to-make-the-process-faster)
- [Imputing Data In Place](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#imputing-data-in-place)
- [Diagnostic Plotting](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#diagnostic-plotting)
- [Feature Importance](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#feature-importance)
- [Imputed Distributions](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#plot-imputed-distributions)
- [Using the Imputed Data](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#using-the-imputed-data)
- [The MICE Algorithm](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#the-mice-algorithm)
- [Introduction](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#the-mice-algorithm)
- [Common Use Cases](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#common-use-cases)
- [Predictive Mean Matching](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#predictive-mean-matching)
- [Effects of Mean Matching](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#effects-of-mean-matching)

## Installation

Expand Down Expand Up @@ -350,7 +329,7 @@ cust_kernel.mice(
)
```

### Imputing New Data with Existing Models
## Imputing New Data with Existing Models

Multiple Imputation can take a long time. If you wish to impute a
dataset using the MICE algorithm, but don’t have time to train new
Expand Down Expand Up @@ -434,6 +413,8 @@ assert not np.any(np.isnan(X_train_t))
assert not np.any(np.isnan(X_test_t))
```

# Advanced Features

## Building Models on Nonmissing Data

The MICE process itself is used to impute missing data in a dataset.
Expand Down Expand Up @@ -843,9 +824,9 @@ print(iris_amp.isnull().sum(0))
dtype: int64


## Diagnostic Plotting
# Diagnostic Plotting

As of now, there is 2 diagnostic plot available. More coming soon!
As of now, there are 2 diagnostic plot available. More coming soon!

### Feature Importance

Expand Down
88 changes: 33 additions & 55 deletions README_gen.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -67,58 +67,29 @@
"\n",
"#### Table of Contents:\n",
"\n",
" - [Package\n",
" Meta](https://github.com/AnotherSamWilson/miceforest#Package-Meta)\n",
" - [The\n",
" Basics](https://github.com/AnotherSamWilson/miceforest#The-Basics)\n",
" - [Basic\n",
" Examples](https://github.com/AnotherSamWilson/miceforest#Basic-Examples)\n",
" - [Customizing LightGBM\n",
" Parameters](https://github.com/AnotherSamWilson/miceforest#Customizing-LightGBM-Parameters)\n",
" - [Available Mean Match\n",
" Schemes](https://github.com/AnotherSamWilson/miceforest#Controlling-Tree-Growth)\n",
" - [Imputing New Data with Existing\n",
" Models](https://github.com/AnotherSamWilson/miceforest#Imputing-New-Data-with-Existing-Models)\n",
" - [Saving and Loading\n",
" Kernels](https://github.com/AnotherSamWilson/miceforest#Saving-and-Loading-Kernels)\n",
" - [Implementing sklearn\n",
" Pipelines](https://github.com/AnotherSamWilson/miceforest#Implementing-sklearn-Pipelines)\n",
" - [Advanced\n",
" Features](https://github.com/AnotherSamWilson/miceforest#Advanced-Features)\n",
" - [Customizing the Imputation\n",
" Process](https://github.com/AnotherSamWilson/miceforest#Customizing-the-Imputation-Process)\n",
" - [Building Models on Nonmissing\n",
" Data](https://github.com/AnotherSamWilson/miceforest#Building-Models-on-Nonmissing-Data)\n",
" - [Tuning\n",
" Parameters](https://github.com/AnotherSamWilson/miceforest#Tuning-Parameters)\n",
" - [On\n",
" Reproducibility](https://github.com/AnotherSamWilson/miceforest#On-Reproducibility)\n",
" - [How to Make the Process\n",
" Faster](https://github.com/AnotherSamWilson/miceforest#How-to-Make-the-Process-Faster)\n",
" - [Imputing Data In\n",
" Place](https://github.com/AnotherSamWilson/miceforest#Imputing-Data-In-Place)\n",
" - [Diagnostic\n",
" Plotting](https://github.com/AnotherSamWilson/miceforest#Diagnostic-Plotting)\n",
" - [Imputed\n",
" Distributions](https://github.com/AnotherSamWilson/miceforest#Distribution-of-Imputed-Values)\n",
" - [Correlation\n",
" Convergence](https://github.com/AnotherSamWilson/miceforest#Convergence-of-Correlation)\n",
" - [Variable\n",
" Importance](https://github.com/AnotherSamWilson/miceforest#Variable-Importance)\n",
" - [Mean\n",
" Convergence](https://github.com/AnotherSamWilson/miceforest#Variable-Importance)\n",
" - [Benchmarks](https://github.com/AnotherSamWilson/miceforest#Benchmarks)\n",
" - [Using the Imputed\n",
" Data](https://github.com/AnotherSamWilson/miceforest#Using-the-Imputed-Data)\n",
" - [The MICE\n",
" Algorithm](https://github.com/AnotherSamWilson/miceforest#The-MICE-Algorithm)\n",
" - [Introduction](https://github.com/AnotherSamWilson/miceforest#The-MICE-Algorithm)\n",
" - [Common Use\n",
" Cases](https://github.com/AnotherSamWilson/miceforest#Common-Use-Cases)\n",
" - [Predictive Mean\n",
" Matching](https://github.com/AnotherSamWilson/miceforest#Predictive-Mean-Matching)\n",
" - [Effects of Mean\n",
" Matching](https://github.com/AnotherSamWilson/miceforest#Effects-of-Mean-Matching)"
" - [Classes](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#classes)\n",
" - [Basic Usage](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#basic-usage)\n",
" - [Example](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#basic-usage)\n",
" - [Customizing LightGBM Parameters](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#customizing-lightgbm-parameters)\n",
" - [Available Mean Match Schemes](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#adjusting-the-mean-matching-scheme)\n",
" - [Imputing New Data with Existing Models](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#imputing-new-data-with-existing-models)\n",
" - [Saving and Loading Kernels](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#saving-and-loading-kernels)\n",
" - [Implementing sklearn Pipelines](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#saving-and-loading-kernels)\n",
" - [Advanced Features](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#advanced-features)\n",
" - [Building Models on Nonmissing Data](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#building-models-on-nonmissing-data)\n",
" - [Tuning Parameters](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#tuning-parameters)\n",
" - [On Reproducibility](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#on-reproducibility)\n",
" - [How to Make the Process Faster](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#how-to-make-the-process-faster)\n",
" - [Imputing Data In Place](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#imputing-data-in-place)\n",
" - [Diagnostic Plotting](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#diagnostic-plotting)\n",
" - [Feature Importance](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#feature-importance)\n",
" - [Imputed Distributions](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#plot-imputed-distributions)\n",
" - [Using the Imputed Data](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#using-the-imputed-data)\n",
" - [The MICE Algorithm](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#the-mice-algorithm)\n",
" - [Introduction](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#the-mice-algorithm)\n",
" - [Common Use Cases](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#common-use-cases)\n",
" - [Predictive Mean Matching](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#predictive-mean-matching)\n",
" - [Effects of Mean Matching](https://github.com/AnotherSamWilson/miceforest?tab=readme-ov-file#effects-of-mean-matching)"
]
},
{
Expand Down Expand Up @@ -475,7 +446,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Imputing New Data with Existing Models\n",
"## Imputing New Data with Existing Models\n",
"\n",
"Multiple Imputation can take a long time. If you wish to impute a\n",
"dataset using the MICE algorithm, but don’t have time to train new\n",
Expand Down Expand Up @@ -586,6 +557,13 @@
"assert not np.any(np.isnan(X_test_t))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Advanced Features"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1128,9 +1106,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostic Plotting\n",
"# Diagnostic Plotting\n",
"\n",
"As of now, there is 2 diagnostic plot available. More coming soon!"
"As of now, there are 2 diagnostic plot available. More coming soon!"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Equations (MICE) with lightgbm. The R version of this package may be found
- Has efficient mean matching solutions.
- Can utilize GPU training
- **Flexible**
- Can impute pandas dataframes and numpy arrays
- Can impute pandas dataframes
- Handles categorical data automatically
- Fits into a sklearn pipeline
- User can customize every aspect of the imputation process
Expand Down

0 comments on commit cea5b51

Please sign in to comment.