Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azure logs] enable azure-eventhub input v2 #12802

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

zmoog
Copy link
Contributor

@zmoog zmoog commented Feb 17, 2025

Proposed commit message

The PR adds the new processor_version configuration option (an related ones) to enable the azure-eventhub input v2 in the Azure Logs integration v2 preview.

  • We increase the min stack version to 8.15.1.
  • Input v2 is only available to for integration v2 preview (no integration v1 support)
  • Add storage_account_connection_string config option (required for input v2); the policy template builds a default value using the storage_account_key but also offers an override option.

The input v2 uses the modern Event Hub SDK.

Notes for reviewers

In this PR, I am adding the input v2 settings to the advanced section of the integration v2 only (events data stream).

The goal is to enable only input v2 for the data stream that can avoid contention among inputs.

However, nothing can stop users from enabling v2 AND one or more v1 integrations. So, adding the input v2 settings to the global scope is better for simplicity.

I would love to hear your thoughts about placing the input v2 settings: data stream vs. global level.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • Update the integration docs

How to test this PR locally

  • Bootstrap a local stack using elastic-package (min stack version 8.15.1)
  • Build the Azure Logs package
  • Install the Azure Logs integration v2 preview
  • In the integration settings > advanced > set processor_version to v2

Related issues

Screenshots

CleanShot 2025-02-21 at 10 45 33@2x

CleanShot 2025-02-21 at 10 55 02@2x

CleanShot 2025-02-18 at 23 50 05@2x

@zmoog zmoog added enhancement New feature or request Integration:azure Azure Logs Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services] labels Feb 17, 2025
@zmoog zmoog self-assigned this Feb 17, 2025
@zmoog zmoog linked an issue Feb 17, 2025 that may be closed by this pull request
@zmoog zmoog changed the title [azure logs] Enable azure-eventhub input v2 configuration options [azure logs] enable azure-eventhub input v2 Feb 17, 2025
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@zmoog zmoog marked this pull request as ready for review February 18, 2025 22:24
@zmoog zmoog requested review from a team as code owners February 18, 2025 22:25
@zmoog zmoog requested review from muthu-mps and alaudazzi February 18, 2025 22:25
@MichaelKatsoulis MichaelKatsoulis self-requested a review February 19, 2025 12:45
@zmoog zmoog added the Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations] label Feb 20, 2025
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@@ -480,6 +501,47 @@ Examples:

This setting can also be used to define your own endpoints, like for hybrid cloud models.

### Input v2 settings (advanced)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is "input v2" and "integration v2" used in this document. Are they referring to the same thing? If so, I'd suggest deciding on one and using it consistently. If not, add some text briefly but clearly explaining their relationship.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you asked because I need a perspective from someone unfamiliar with these things.

"input v2" and "integration v2" are two different things.

Let me share more; probably, you can help me with this naming problem.

  • Integration v2: is a new integration w/ routing within the Azure Logs package; we designed it to replace all other integrations with only one input per event hub instead of multiple ones.
  • Input v2: is the new version of the Filebeat azure-eventhub input that uses the new Event Hub SDK.

The azure-eventhub input supports the legacy (v1) and the modern (v2) SDK and can switch from using a config option. See https://github.com/elastic/beats/blob/cd883f511c3cc5664a15f0fd66fe6a83ae655c27/x-pack/filebeat/input/azureeventhub/input.go#L70-L77 for more.

Even if some users know the internals, we usually do not mention inputs in integration docs. However, I would love to convey that they can stay with the old v1 'engine' or pick the new v2.

Instead of 'input,' I considered using the name 'processor.' The legacy and modern SDKs have the concept of 'processor,' so it makes sense to use it.

Let me know which name you think is clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense.

Not a direct answer, but when I am talking to others who don't deal with the internals, I use the term "collector" to describe inputs because it explains what the function of the entity is. This term is currently unladen in the context of beats and so is reasonably safe (though this may change due to OTel, and adding the "data" prefix is probably a good idea). To maintain this agnosticism, you'd try to avoid putting it in a setting where it gains a life, so for the heading something like "New data collector settings (advanced)" or "Version 2 data collector sett…" or "Azure SDK v2 data collec…" (versus "Legacy data coll…").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, "data collector" is a good name; I regret not considering it when we added the new Event Hub SDK in Beats.

We added the "processor" concept to the azure-eventhub input in Beats 8.15 to have both legacy v1 and modern v2 side by side. I feel this "processor" has become part of the API, and we may need to stick to it.

Having a versioned input component should stay an advanced thing. I anticipate the processor v2 graduating as the default processor in 6-9 months as we add more v2-only features like Entra ID authentication options (service principal, workload identity, etc).

Is "processor" a better component name than "input," and is it good enough for an advanced scope setting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern about "processor" is that it's overloaded with these. But, having said that, I have no answers, only questions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal. I removed the 'input' concept from the docs and replaced it with 'processor'.

I removed 'input' because technically, we have one input, with a configurable internal component called 'processor.'

Both legacy and modern SDKs identify the 'processor' as the internal component responsible for processing messages fetched by event hub partition consumers.

zmoog and others added 2 commits February 21, 2025 01:31
@@ -20,9 +20,33 @@ storage_account: {{storage_account}}
{{#if storage_account_key}}
storage_account_key: {{storage_account_key}}
{{/if}}
{{#if storage_account_connection_string}}
: {{storage_account_connection_string}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not processing the storage_account_connection_string through user input?
If this values doesn't come as input from the user, Then we could remove the IF condition. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

We can change the UI elements once when we'll add the Entra ID authentication options for the processor v2.


The input v2 is in preview. Input v1 is still the default and is recommended for typical use cases.

See the "Settings" section for more details about enabling the input v2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two different settings section, one which covers the details Event hub config and the other has Input v2 settings (advanced). Shouldn't this be input V2 setting?

Screenshot 2025-02-21 at 11 52 28 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should remove the "(advanced)" from the title, and only leave "input/processor v2" only settings.

_boolean_
(v2 only) Flag to control if the input should perform the checkpoint information migration from v1 to v2 at startup. The checkpoint migration converts the checkpoint information from the v1 format to the v2 format.

Default is `false`, which means the input will not perform the checkpoint migration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Default can also go as bullet point. In the document it looks as a separate section.

Please see below.

Screenshot 2025-02-21 at 11 49 38 AM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

We offer no storage_account_connection_string setting option in the
UI yet. We'll revise the UI once for the upcoming Entra ID auth
settings.
Introduces the concept of processor as a configurable option available
starting from stack 8.15.1 and integration 1.23.0.

I removed 'input' because technically, we have one input, with a
a configurable internal component called 'processor.'

Both legacy and modern SDKs identify the 'processor' as the internal
component responsible for processing messages fetched by event hub
partition consumers.
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @zmoog

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:azure Azure Logs Team:obs-ds-hosted-services Label for the Observability Hosted Services team [elastic/obs-ds-hosted-services] Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Azure Logs] Enable azure-eventhub input v2
4 participants