-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SystemLimit instead of settings for device rate limiting #35781
base: master
Are you sure you want to change the base?
Conversation
Realistically I don't think this is a likely scenario in our current platform since a mobile worker is scoped to a domain, but noticed this vulnerability in the test and decided it was easy enough to protect against.
This commit did a lot. In addition to replacing the limit in settings with SystemLimit, I removed the feature flag for an increased limit since SystemLimit solves that, and moved the setting to enable the device rate limiter to a feature flag for ease of use. I also updated metrics to differentiate between an actual rate limited attempt vs a would-be rate limited.
@coderabbitai review |
WalkthroughThe changes modify how device rate limiting is implemented. In the user application, the Sequence Diagram(s)sequenceDiagram
participant U as User/Application
participant DRL as DeviceRateLimiter
participant Toggle as Toggle Module
participant Redis as Redis Store
participant Metric as Metrics Logger
U->>DRL: Call rate_limit_device(domain, user_id, device_id)
DRL->>Toggle: Check if DEVICE_RATE_LIMITER toggle enabled for domain
alt Toggle Enabled
DRL->>Metric: Log rate-limited event
DRL->>DRL: Generate Redis key using (domain, user_id)
DRL->>Redis: Retrieve/Update usage data with unique key
DRL->>U: Return True
else Toggle Disabled
DRL->>Metric: Log non rate-limited event
DRL->>U: Return False
end
sequenceDiagram
participant Caller as Caller
participant SL as SystemLimit
participant DB as Database
Caller->>SL: Call for_key(key, domain)
SL->>DB: Query SystemLimit objects filtering by key and domain (if provided)
DB->>SL: Return sorted matching records (descending by domain)
SL->>Caller: Return first matching limit or None
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
corehq/apps/users/device_rate_limiter.py (1)
30-30
: Consider caching domain limit lookups.
Repeatedly callingSystemLimit.for_key
on every request might be costly under heavy load. For high-volume domains, you could cache the result ofSystemLimit.for_key
for a short duration (e.g., using Django’s caching frameworks) to improve performance.def device_limit_per_user(self, domain): - return SystemLimit.for_key(DEVICE_LIMIT_PER_USER_KEY, domain=domain) or DEVICE_LIMIT_PER_USER_DEFAULT + limit = SystemLimit.for_key(DEVICE_LIMIT_PER_USER_KEY, domain=domain) + if limit is None: + limit = DEVICE_LIMIT_PER_USER_DEFAULT + return limitcorehq/project_limits/models.py (1)
101-111
: LGTM! The implementation effectively handles domain-specific limits.The
for_key
method is well-implemented with:
- Clear prioritization of domain-specific limits over global limits
- Efficient query construction using Q objects
- Proper handling of the optional domain parameter
Consider adding docstring examples to illustrate the prioritization behavior:
""" Return the value associated with the given key, prioritizing specificity + + Examples: + >>> SystemLimit.objects.create(key='foo', limit=1) # global limit + >>> SystemLimit.objects.create(key='foo', domain='bar', limit=2) # domain-specific + >>> SystemLimit.for_key('foo') # returns 1 (global) + >>> SystemLimit.for_key('foo', domain='bar') # returns 2 (domain-specific) + >>> SystemLimit.for_key('foo', domain='baz') # returns 1 (falls back to global) """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
corehq/apps/users/device_rate_limiter.py
(5 hunks)corehq/apps/users/tests/test_device_rate_limiter.py
(1 hunks)corehq/project_limits/models.py
(1 hunks)corehq/project_limits/tests/test_system_limit.py
(1 hunks)corehq/toggles/__init__.py
(1 hunks)settings.py
(0 hunks)
💤 Files with no reviewable changes (1)
- settings.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: - Review the code following best practises and st...
**/*.py
: - Review the code following best practises and standards
corehq/project_limits/models.py
corehq/apps/users/tests/test_device_rate_limiter.py
corehq/project_limits/tests/test_system_limit.py
corehq/toggles/__init__.py
corehq/apps/users/device_rate_limiter.py
🔇 Additional comments (9)
corehq/apps/users/device_rate_limiter.py (3)
17-18
: Consistent naming for keys and fallback default.
Defining separate constants for the device limit key and its default value is good practice and keeps the logic clear.
67-74
: Toggle-based rate limiting logic is well-structured.
Using theDEVICE_RATE_LIMITER
toggle here is clear. The contrasting metric counters (rate_limited
vs.rate_limited.test
) help differentiate environments or usage scenarios. Ensure that the “test” metric is being tracked intentionally.
76-84
: Domain-scoped Redis key fixes domain collision issues.
Including the domain in the Redis key creation is a solid fix to scope usage data more accurately. This prevents conflicts across domains.corehq/toggles/__init__.py (1)
2998-3005
: New feature toggle aligns with the PR’s objective.
TheDEVICE_RATE_LIMITER
toggle is declared clearly and includes an owner. Consider adding ahelp_link
if there’s existing documentation for domain admins.corehq/project_limits/tests/test_system_limit.py (1)
1-19
: Comprehensive test coverage for system limits.
These tests thoroughly confirm the fallback toNone
, the general limit, and the domain-specific limit. The approach is straightforward and covers core scenarios.corehq/apps/users/tests/test_device_rate_limiter.py (4)
11-12
: LGTM! Good transition to TestCase for database access.The change from
SimpleTestCase
toTestCase
is appropriate given the need to interact with the database forSystemLimit
objects.
65-74
: LGTM! Well-structured test for domain-specific higher limits.The test effectively verifies that:
- Domain-specific limits take precedence
- Different domains can have different limits
- Higher domain-specific limits allow more devices
75-85
: LGTM! Comprehensive test for domain-specific lower limits.The test properly verifies that:
- Global limits can be overridden by domain-specific limits
- Lower domain-specific limits are enforced correctly
- Other domains still use the global limit
97-101
: LGTM! Good test for domain isolation.The test ensures that:
- Device limits are properly scoped to domains
- Activity in one domain doesn't affect limits in another
- Same device ID can be rate-limited differently across domains
def test_allowed_if_no_devices_have_been_used_yet(self): | ||
self.assertFalse(device_rate_limiter.rate_limit_device(self.domain, 'user-id', 'new-device-id')) | ||
|
||
@override_settings(DEVICE_LIMIT_PER_USER=2) | ||
@flag_enabled("DEVICE_RATE_LIMITER") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do this on the class itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works at the class level, but then @flag_disabled didn't work for the one test where I wanted to use it since the patch had already been applied for the @flag_enabled decorator. I briefly looked to see if it would be an easy fix but came up empty handed and decided to just do this instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's annoying
self.addCleanup(clear_redis) | ||
|
||
@flag_enabled("DEVICE_RATE_LIMITER") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this decorator not be applied to the class? If not, wonder how hard it would be to change it to work at the class level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works at the class level, but then @flag_disabled
didn't work for the one test where I wanted to use it since the patch had already been applied for the @flag_enabled
decorator. I briefly looked to see if it would be an easy fix but came up empty handed and decided to just do this instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these decorators originally and I vaguely recall also running into that shortcoming and not seeing an easy fix. Would love to see that addressed if anyone can figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it as a hackathon idea for tomorrow 🤷🏻
- add more assertions in SystemLimit.for_key tests - handle case where limit is 0 - use tag to determine if rate limited metric was enforced or not
Expecting tests to fail due to not clearing redis. Offlining with @millerdev about what the proper solution is. |
Just kidding this was because I wasn't clearing the classmethod cache correctly! Fixed in dbdd553 |
- For testing, skip caching for_key as it creates too many problems - Make sure to clear the cache when a SystemLimit is deleted too
Clearing all of redis masked issues with the caching logic in the SystemLimit.for_key which isn't good. The right approach is to skip caching when running in a test.
Alright no more commits! Sorry for the churn. |
This doesn't actually matter since the data being saved in redis is very small, but this change caches more wisely. Previously each key+domain combo would have a cached key, even though in most cases, a domain will not have a unique value from the global value. This change only stores a key+domain value in cache when necessary.
I lied. Made a slight change to the caching strategy for SystemLimit that came up during the dev block. No more changes starting now. |
corehq/project_limits/models.py
Outdated
self._get_global_limit.clear(self.__class__, self.key) | ||
self._get_domain_specific_limit.clear(self.__class__, self.key, self.domain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be smarter about only clearing the cache for the specific methods when relevant (if self.domain
is a blank string vs something specific)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...hmm but what someone modifies an existing limit to be scoped to domain (or vice versa). That would not be an ideal usage, but would result in us not clearing the correct cache with this updated logic...
We know which cache to clear depending on the domain field of the object being saved/deleted.
domain_filter |= models.Q(domain=domain) | ||
filters = models.Q(key=key) & domain_filter | ||
return SystemLimit.objects.filter(filters).order_by("-domain").values_list("limit", flat=True).first() | ||
domain_limit = cls._get_domain_specific_limit(key, domain) if domain else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change only stores a key+domain value in cache when necessary.
Not quite clear what was meant by "when necessary", but this will cache a value (possibly None
) for key+domain any time bool(domain) is True
, right?
Could the second DB query below be avoided in that case (and a useful limit cached for each key+domain) if _get_domain_specific_limit
used the original query?
SystemLimit.objects.filter(filters).order_by("-domain").values_list("limit", flat=True).first()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might complicate cache clearing though since the global limit would be cached per domain, and those would not be cleared when the global limit is changed.
I'm curious how much more performant this is with caching applied? The SQL query should be very fast, possibly not much slower than hitting redis. Would be interesting to measure.
This reverts commit ce7b010. It is possible to modify an existing limit and add a domain, in which case we would be clearing the wrong cache, so we need to revert.
Product Description
Technical Summary
https://dimagi.atlassian.net/browse/SAAS-16355
https://dimagi.atlassian.net/browse/SAAS-16561
I recently added a
SystemLimit
class that makes it much easier to manage limits, and moving this limit over from settings to the SystemLimit in this PR. There are a few other changes that go along with this refactor, like moving the setting to enable the device rate limiter over to a feature flag, and a slight bug fix (sort of) related to the redis key not being scoped to a domain.Feature Flag
DEVICE_RATE_LIMITER
Safety Assurance
Safety story
Automated test coverage
QA Plan
No, I'll test myself.
Rollback instructions
Labels & Review