Derive Data rules

WORK IN PROGRESS! DO NOT USE

Derive Data Rules

An important task and skill in data cleaning is to get to "know" your data. This process takes considerable time and often the resulting knowledge is fragmented. Fragmented in the individual observations that are different from implicit expectations, leading to anecdotical knowledge, but also fragmented across members of a data science team.

An alternative approach is to develop an explicit set of data validation rules. Rules that state when data is valid for your analysis. This approach is part of the validate package suite.

Many of the validate related packages do not consider the process of creating the rule sets, they simply assume the rules are available and can be used to check, correct or impute your data.

deriverules is about creating data validation rule set for new data sets: it aims to help to bootstrap the rule finding process.

Working ideas

Use "clean" data to derive boundaries for the data, e.g. univariate.
Use covariance to derive linear equalities in the data.
Use outlier techniques to derive data validation rules.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
R		R
experiments		experiments
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
deriverules.Rproj		deriverules.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Derive Data rules

WORK IN PROGRESS! DO NOT USE