Replace dgraph-io/badger cache storage with etcd-io/bbolt #42571

stefans-elastic · 2025-02-03T17:14:37Z

Proposed commit message

Replacing dgraph-io/badger persistent storage for key-value cache with etcd-io/bbolt. Originally it was meant to just get rid of go.opencensus.io dependency which is introduced by badger (please see parent issue for more details). After it got evident that this won't erase go.opencensus.io dependency it was decided that this work still should be done since etcd-io/bbolt is already used elsewhere in the project and it isn't a good thing to have multiple storages for cache (again, please see parent issue for more details (in comments)).

Implementation should be fairly straight-forward but I would like to clarify one thing - since bolt doesn't support value expiration the expiration time (and TTL) are stored as metadata of the value. Upon value retrieval it is checked for expiration and if it is expired then nil is returned and value gets deleted from bolt DB.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

[ ]

How to test this PR locally

Related issues

Closes Drop dbadger-io dependency #42443

Use cases

Screenshots

Logs

mergify · 2025-02-03T17:15:13Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @stefans-elastic? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

…dger-io

botelastic · 2025-02-04T09:36:01Z

This pull request doesn't have a Team:<team> label.

AndersonQ · 2025-02-10T14:05:19Z

x-pack/libbeat/persistentcache/persistentcache.go

+		if err != nil {
+			c.log.Debugf("Key '%s' not found in key-value store", k)


Sorry, but could you log the error instead of saying the key was not found?

sure, done
please check it out

stefans-elastic · 2025-02-10T15:36:37Z

/test

stefans-elastic · 2025-02-11T11:45:30Z

@elastic/beats-tech-leads / @VihasMakwana could you review this PR?

cmacknz · 2025-02-11T21:35:09Z

Looks like the persistent cache has a few uses related to cloudfoundry including the add_cloudfoundry_metadata process via the newClientCacheWrap function.

rg newClientCacheWrap
x-pack/libbeat/common/cloudfoundry/hub.go
82:func (h *Hub) ClientWithCache() (Client, error) {

x-pack/libbeat/common/cloudfoundry/cache_integration_test.go
41:             client, err := hub.ClientWithCache()
52:             client, err := hub.ClientWithCache()

x-pack/libbeat/processors/add_cloudfoundry_metadata/add_cloudfoundry_metadata.go
51:     client, err := hub.ClientWithCache()

Let's indicate in the changelog that this change only impacts Cloudfoundry related functionality. It looks like the impact would be we essentially clear the cache and start from scratch, which doesn't seem breaking to me.
Do we have any way to sanity check check any of this running on Cloudfoundry itself?

stefans-elastic · 2025-02-12T11:20:51Z

Looks like the persistent cache has a few uses related to cloudfoundry including the add_cloudfoundry_metadata process via the newClientCacheWrap function.
rg newClientCacheWrap
x-pack/libbeat/common/cloudfoundry/hub.go
82:func (h *Hub) ClientWithCache() (Client, error) {

x-pack/libbeat/common/cloudfoundry/cache_integration_test.go
41:             client, err := hub.ClientWithCache()
52:             client, err := hub.ClientWithCache()

x-pack/libbeat/processors/add_cloudfoundry_metadata/add_cloudfoundry_metadata.go
51:     client, err := hub.ClientWithCache()
Let's indicate in the changelog that this change only impacts Cloudfoundry related functionality. It looks like the impact would be we essentially clear the cache and start from scratch, which doesn't seem breaking to me.

Do we have any way to sanity check check any of this running on Cloudfoundry itself?

I've updated the changelog message. Please take a look.
I'm not sure how to do this (I would need some assistance with testing on Cloudfoundry)

cmacknz

Changelog looks good, I also have no experience working with cloudfoundry so I can't be much help there, but we must have tested this in the past.

I see @jsoriano in the Git history so maybe he can give us some leads.

jsoriano · 2025-02-12T23:35:53Z

I see @jsoriano in the Git history so maybe he can give us some leads.

I cannot help with testing as I haven't used CF in years, but I can try to help with the background of this cache.

You can find here a summary of the analysis that lead to using badger for this use case: #19511 (comment)

The summary-of-the-summary is that we needed it to perform well in clusters with several thousands of applications, and we needed it to cleanup unused entries. Badger fitted better than other alternatives as it performed well under pressure, and it had built-in TTL support.

The kind of expiration added in this PR may not work so well for this use case, because it won't remove entries that stop being accessed, as the ones for applications that stop producing events, that is the most common use case when we want entries to be removed here.

Other thing to take into account is that add_cloudfoundry_metadata may be unnecessary in current deployments, as Cloudfoundry started attaching this metadata to all events and we don't need to query and cache it (see #26868), so maybe we don't need to care a lot about its performance.

jsoriano · 2025-02-12T23:47:49Z

Btw, maybe we need to add some release notes, to warn users of add_cloudfoundry_metadata to be careful when upgrading to the version containing this change, as their caches will be regenerated on first start, potentially making loads of queries to CF APIs.

stefans-elastic · 2025-02-13T08:30:48Z

@jsoriano

Btw, maybe we need to add some release notes

Do you mean I need to add an entry to CHANGELOG.next.asciidoc

jsoriano · 2025-02-13T10:24:04Z

Btw, maybe we need to add some release notes

Do you mean I need to add an entry to CHANGELOG.next.asciidoc

Yes, maybe this is enough as this seems to appear in https://www.elastic.co/guide/en/beats/libbeat/current/release-notes-8.17.2.html

stefans-elastic · 2025-02-13T10:59:39Z

@jsoriano I've added CHANGELOG.next.asciidoc entry, please take a look

CHANGELOG.next.asciidoc

x-pack/libbeat/kv/bbolt/bbolt.go

Co-authored-by: Jaime Soriano Pastor <[email protected]>

jsoriano · 2025-02-13T13:39:01Z

PR open to try to address the root issue upstream: hypermodeinc/badger#2169

mauri870 · 2025-02-13T15:55:12Z

PR open to try to address the root issue upstream: hypermodeinc/badger#2169

Thanks for that! The repo seems fairly active, if we can avoid the rewrite that would be great.

cmacknz · 2025-02-13T16:05:12Z

We definitely need to test this on CloudFoundry before releasing it to anybody, looking through the log of closed SDHs it is definitely still used but I don't know by how many users.

That CI doesn't effectively test this is concerning, we are maintaining this by hoping nothing that breaks it ever changes.

If we can fix upstream and avoid a potential long tail of support pain here then that could be the best path. Efforts are probably better focused on figuring out how to maintain CloudFoundry support properly first.

stefans-elastic · 2025-02-14T09:50:04Z

We definitely need to test this on CloudFoundry before releasing it to anybody, looking through the log of closed SDHs it is definitely still used but I don't know by how many users.

That CI doesn't effectively test this is concerning, we are maintaining this by hoping nothing that breaks it ever changes.

If we can fix upstream and avoid a potential long tail of support pain here then that could be the best path. Efforts are probably better focused on figuring out how to maintain CloudFoundry support properly first.

@cmacknz Should I close this PR?

cmacknz · 2025-02-14T16:47:05Z

It definitely sounds like we are not in a position to make big changes to this yet. I'm not officialy a codeowner for cloudfoundry, so if you feel like the risk is too great go ahead and close it.

Just because we wrote code doesn't mean we have to keep it :)

stefans-elastic · 2025-02-17T08:28:09Z

@cmacknz I really wouldn't like to break anything so I guess it would be safer not to merge this PR. That being said I dont feel like having 2 different stores(bbolt and badger) in the codebase for cache isn't ideal. Since using bbolt here might be dangerous here (lack of TTL functionality might cause keeping stale data in the storage) we might want to consider switch from bbolt to badger in other places.

jsoriano · 2025-02-17T18:04:22Z

Fix merged upstream, but I don't know when a version containing it will be released.

stefans-elastic · 2025-02-18T07:58:00Z

Closing this PR as we are going to update badger once new version is released (that doesn't depend on opencensus, the change was made in this PR)

stefans-elastic added 5 commits January 31, 2025 16:07

add bbolt store for key-value cache

4f10346

fix unit test failures

fbe7ebf

delete expired keys

7fa4f37

set expireAt properly

1c4a798

add unit tests

183b395

stefans-elastic self-assigned this Feb 3, 2025

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 3, 2025

stefans-elastic added 5 commits February 3, 2025 19:18

add missing license headers

36c3933

go mod tidy, update NOTICE.txt

9b20f2d

fix linter issues

f2303dd

add more godoc comments

19bed5b

Merge branch 'main' of github.com:stefans-elastic/beats into drop-dba…

17372b1

…dger-io

stefans-elastic added the Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team label Feb 4, 2025

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 4, 2025

stefans-elastic added needs_team Indicates that the issue/PR needs a Team:* label backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify labels Feb 4, 2025

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 4, 2025

stefans-elastic added the bug label Feb 4, 2025

add CHANGELOG-developer.next.asciidoc entry

7ab7864

stefans-elastic changed the title ~~Drop dbadger io~~ Replace dgraph-io/badger cache storage with etcd-io/bbolt Feb 4, 2025

stefans-elastic marked this pull request as ready for review February 4, 2025 11:41

stefans-elastic requested review from a team as code owners February 4, 2025 11:41

stefans-elastic requested review from AndersonQ and VihasMakwana February 4, 2025 11:41

stefans-elastic mentioned this pull request Feb 4, 2025

Drop dbadger-io dependency #42443

Open

AndersonQ reviewed Feb 10, 2025

View reviewed changes

include error in log message

43d2f8f

AndersonQ approved these changes Feb 10, 2025

View reviewed changes

Merge branch 'main' into drop-dbadger-io

041ddb8

stefans-elastic added 2 commits February 12, 2025 13:18

update CHANGELOG-developer.next.asciidoc message

477fb0a

Merge branch 'main' into drop-dbadger-io

3d37518

cmacknz approved these changes Feb 12, 2025

View reviewed changes

stefans-elastic added 2 commits February 13, 2025 12:58

add CHANGELOG.next.asciidoc entry

c77b7f1

Merge branch 'main' into drop-dbadger-io

ddc31ed

jsoriano reviewed Feb 13, 2025

View reviewed changes

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved

x-pack/libbeat/kv/bbolt/bbolt.go Show resolved Hide resolved

stefans-elastic and others added 2 commits February 13, 2025 13:38

Update CHANGELOG.next.asciidoc

c47826e

Co-authored-by: Jaime Soriano Pastor <[email protected]>

move entry to correct section in CHANGELOG.next.asciidoc

30a92e9

stefans-elastic closed this Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace dgraph-io/badger cache storage with etcd-io/bbolt #42571

Replace dgraph-io/badger cache storage with etcd-io/bbolt #42571

stefans-elastic commented Feb 3, 2025 •

edited

Loading

mergify bot commented Feb 3, 2025

botelastic bot commented Feb 4, 2025

AndersonQ Feb 10, 2025

stefans-elastic Feb 10, 2025

stefans-elastic commented Feb 10, 2025

stefans-elastic commented Feb 11, 2025

cmacknz commented Feb 11, 2025

stefans-elastic commented Feb 12, 2025

cmacknz left a comment

jsoriano commented Feb 12, 2025

jsoriano commented Feb 12, 2025

stefans-elastic commented Feb 13, 2025

jsoriano commented Feb 13, 2025

stefans-elastic commented Feb 13, 2025

jsoriano commented Feb 13, 2025

mauri870 commented Feb 13, 2025

cmacknz commented Feb 13, 2025

stefans-elastic commented Feb 14, 2025

cmacknz commented Feb 14, 2025

stefans-elastic commented Feb 17, 2025

jsoriano commented Feb 17, 2025

stefans-elastic commented Feb 18, 2025

		if err != nil {
		c.log.Debugf("Key '%s' not found in key-value store", k)

Replace dgraph-io/badger cache storage with etcd-io/bbolt #42571

Replace dgraph-io/badger cache storage with etcd-io/bbolt #42571

Conversation

stefans-elastic commented Feb 3, 2025 • edited Loading

Proposed commit message

Checklist

Disruptive User Impact

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

mergify bot commented Feb 3, 2025

botelastic bot commented Feb 4, 2025

AndersonQ Feb 10, 2025

Choose a reason for hiding this comment

stefans-elastic Feb 10, 2025

Choose a reason for hiding this comment

stefans-elastic commented Feb 10, 2025

stefans-elastic commented Feb 11, 2025

cmacknz commented Feb 11, 2025

stefans-elastic commented Feb 12, 2025

cmacknz left a comment

Choose a reason for hiding this comment

jsoriano commented Feb 12, 2025

jsoriano commented Feb 12, 2025

stefans-elastic commented Feb 13, 2025

jsoriano commented Feb 13, 2025

stefans-elastic commented Feb 13, 2025

jsoriano commented Feb 13, 2025

mauri870 commented Feb 13, 2025

cmacknz commented Feb 13, 2025

stefans-elastic commented Feb 14, 2025

cmacknz commented Feb 14, 2025

stefans-elastic commented Feb 17, 2025

jsoriano commented Feb 17, 2025

stefans-elastic commented Feb 18, 2025

stefans-elastic commented Feb 3, 2025 •

edited

Loading