Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] AnomalyCheck fails because AnalyzerOptions do not persist in FileSystemMetricsRepository #596

Open
kchaturvedi opened this issue Jan 27, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@kchaturvedi
Copy link

kchaturvedi commented Jan 27, 2025

Describe the bug

When running AnomalyChecks via VerificationSuite with a InMemoryMetricsRepository, and have configured analyzerOptions on the Analyzer, all checks pass as expected.
However, when simply switching to FileSystemMetricsRepository, the same checks fail.

To Reproduce

Given a dataframe such as:

val currentData = Seq(
        (1, "red"),
        (2, "blue"),
        (3, "green"),
        (4, "red"),
        (5, "red"),
        (6, "red"),
        (7, "red"),
        (8, "red"),
        (9, "green"),
        (10, "red")
).toDF("index", "values")

and given a bare minimum VerificationSuite with analyzerOptions like:

val verificationBuilder = VerificationSuite()
  .onData(currentData)
  .useRepository(metricsRepository)
  .saveOrAppendResult(ResultKey(LocalDateTime.now().toInstant(ZoneOffset.UTC).toEpochMilli))
  .addAnomalyCheck(
    BatchNormalStrategy(
      lowerDeviationFactor = Some(1),
      upperDeviationFactor = Some(1)
    ),
    Compliance(
      s"`values` is red",
      s"`values` IN ('red')",
      analyzerOptions = Some(
        AnalyzerOptions(
          filteredRow = FilteredRowOutcome.NULL
        )
      )
    ),
    None
  )

Using InMemoryMetricsRepository

when using an InMemoryMetricsRepository, we can see the results passing:

val results = verificationBuilder.run()
assert(results.status == CheckStatus.Success)
results.checkResults.head._2.constraintResults.head
ConstraintResult(
  AnomalyConstraint(
    Compliance(`values` is red,`values` IN ('red'),None,List(),Some(AnalyzerOptions(Ignore,NULL)))),
    Success,
    None,
    Some(DoubleMetric(Column,Compliance,`values` is red,Success(0.7),Some((values IN (red))))
  )
)

Using FileSystemMetricsRepository

However, when switching to a FileSystemMetricsRepository, the same execution fails:

results.checkResults.head._2.constraintResults.head
ConstraintResult(
  AnomalyConstraint(
    Compliance(`values` is red,`values` IN ('red'),None,List(),Some(AnalyzerOptions(Ignore,NULL)))),
    Failure,
    Some(Can't execute the assertion: requirement failed: Excluding values in searchInterval from calculation but not enough values remain to calculate mean and stdDev.!),
    Some(DoubleMetric(Column,Compliance,`values` is red,Success(0.7),Some((values IN (red))))
  )
)

When inspecting the metrics json file, we can see that analyzerOptions are not present in any of the historical metrics. Here is a sample of one of the results in the file:

{
    "resultKey": {
      "dataSetDate": 1720779126192,
      "tags": {}
    },
    "analyzerContext": {
      "metricMap": [
        {
          "analyzer": {
            "analyzerName": "Compliance",
            "instance": "`values` is red",
            "predicate": "`values` IN (\u0027red\u0027)",
            "columns": []
          },
          "metric": {
            "metricName": "DoubleMetric",
            "entity": "Column",
            "instance": "values",
            "name": "Compliance",
            "value": 0.7
          }
        }
      ]
    }
  }

Workarounds

When analyzerOptions are removed from the Analyzer, and the FileSystemMetricsRepository is used, all checks pass as expected (as the current check constraint is able to match historical values from the repository).

val verificationBuilder = VerificationSuite()
  .onData(currentData)
  .useRepository(metricsRepository)
  .saveOrAppendResult(ResultKey(LocalDateTime.now().toInstant(ZoneOffset.UTC).toEpochMilli))
  .addAnomalyCheck(
    BatchNormalStrategy(
      lowerDeviationFactor = Some(1),
      upperDeviationFactor = Some(1)
    ),
    Compliance(
      s"`values` is red",
      s"`values` IN ('red')"
    ),
    None
  )

val results = verificationBuilder.run()
assert(results.status == CheckStatus.Success)

If analyzerOptions is required, using InMemoryMetricsRepository resolves this issue as well.

Expected behavior

  • AnalyzerOptions should be able to be used when using FileSystemMetricsRepository.
  • Changing the metrics repository should not change the execution result of the verification suite, as the results in any instance of MetricsRepository should be identical and cross-compatible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant