-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary Classification Confusion Matrix and AUC Aggregators #633
base: develop
Are you sure you want to change the base?
Binary Classification Confusion Matrix and AUC Aggregators #633
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #633 +/- ##
===========================================
+ Coverage 82% 82.25% +0.25%
===========================================
Files 111 113 +2
Lines 5156 5207 +51
Branches 457 479 +22
===========================================
+ Hits 4228 4283 +55
+ Misses 928 924 -4
Continue to review full report at Codecov.
|
Hey @richwhitjr, thank you for this! We should have some time to look today. |
*/ | ||
object AreaUnderCurve { | ||
private def trapezoid(points: Seq[(Double, Double)]): Double = { | ||
require(points.length == 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to take the two points directly instead of using this assertion, head, last combo?
* with different thresholds | ||
*/ | ||
case object CurveMonoid extends Monoid[Curve] { | ||
def zero = Curve(Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looked like we don't have a test yet that exercises the zero
.
*/ | ||
case object CurveMonoid extends Monoid[Curve] { | ||
def zero = Curve(Nil) | ||
override def plus(left: Curve, right: Curve): Curve = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a sumOption implementation that would be more efficient that we should add here as well, for adding multiple curves at once?
extends Aggregator[BinaryPrediction, Curve, Double] | ||
with Serializable { | ||
|
||
private def linspace(a: Double, b: Double, length: Int = 100): Array[Double] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we never call this without default args - should we remove the default 100?
} | ||
|
||
property("Curve is associative") { | ||
isAssociative[Curve] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the monoid laws?
* | ||
* @param matrices List of Matrices | ||
*/ | ||
case class Curve(matrices: List[ConfusionMatrix]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you add an object with an implicit def monoid: Monoid[Curve] = CurveMonoid
you won't need to declare the same in the tests.
@richwhitjr this is getting there - I think it's looking good. One ask that we have for all new data structures is that you add a section with a few examples to the documentation site. here are the instructions on how to do this: https://github.com/twitter/algebird/blob/develop/CONTRIBUTING.md#contributing-documentation All data structure examples are compiled by CI, so it's extremely helpful to get a mini tutorial in at the same time the data structure is merged. here are a few examples of existing pages: |
I looked at adding sumOption but couldn't think of a way to make it more efficient without a lot of complexity and not sure if it is entirely worth it. It would help with object creation but on the other side we would have to aggregate the confusion matrix variables outside into some type of dictionary and reconstruct it at the end. |
Richard Whitcomb seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Some of this works is derived from similar functions in the Spark library. For Binary Classifications tasks it will compute a confusion matrix. You can also aggregator over these confusions matrices with different thresholds to compute an Area under the Curve Stat for both PR and ROC.