-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updates for enterprise #1247
base: main
Are you sure you want to change the base?
updates for enterprise #1247
Conversation
WalkthroughThis pull request introduces several new and enhanced functionalities. It updates the CLI options to include a configurable indexer endpoint and modifies URL generation based on operation mode. Changes in metadata handling include new structures and methods for both ingestors and indexers, plus exposed retrieval of indexer metadata via HTTP handlers. Additionally, the PR adds multipart upload methods to various storage implementations and adjusts utility functions and client configurations. Overall, the changes extend the system’s flexibility in endpoint configuration, metadata management, and file uploading. Changes
Sequence Diagram(s)sequenceDiagram
participant C as Client
participant O as Options
alt Request with Mode::Ingest
C->>O: get_url(Mode::Ingest)
O-->>C: Returns ingestor_endpoint URL
else Request with Mode::Index
C->>O: get_url(Mode::Index)
O-->>C: Returns indexer_endpoint URL
end
sequenceDiagram
participant App as Application
participant S3 as S3 Storage
participant FS as FileSystem
participant AW as Async Writer
App->>S3: upload_multipart(key, path)
S3->>FS: Read file metadata
alt File size < 5MB
S3->>AW: Upload file in a single part
else File size ≥ 5MB
S3->>FS: Split file into chunks
loop Each Chunk
S3->>AW: Upload chunk
end
end
AW-->>App: Confirm upload result
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
⏰ Context from checks skipped due to timeout of 90000ms (6)
🔇 Additional comments (5)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
8d3c740
to
b98106b
Compare
b98106b
to
8699ce8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (4)
src/cli.rs (1)
407-485
: Refactor Duplicate Logic inget_url
Although adding
Mode::Index
is correct, the blocks forMode::Ingest
andMode::Index
largely duplicate logic. Consider extracting the shared logic into a helper function for maintainability.pub fn get_url(&self, mode: Mode) -> Url { let (endpoint, env_var) = match mode { Mode::Ingest => (&self.ingestor_endpoint, "P_INGESTOR_ENDPOINT"), Mode::Index => (&self.indexer_endpoint, "P_INDEXER_ENDPOINT"), _ => panic!("Invalid mode"), }; if endpoint.is_empty() { - // Duplicate code for returning default self.address-based URL ... + return self.build_default_url(); // Example helper } // ... }src/parseable/mod.rs (2)
132-133
: Consider consolidating metadata fields with a generic approach.
This newindexer_metadata
field closely mirrors the pattern ofingestor_metadata
. Using a generic or unified structure could reduce code duplication and make maintenance simpler.
268-329
: Refactor to eliminate duplication instore_metadata
.
The branches forMode::Ingest
andMode::Index
share repeated logic. Extracting common steps into a helper function could reduce duplication and simplify maintenance.src/handlers/http/modal/mod.rs (1)
337-510
: Duplicated struct logic betweenIndexerMetadata
andIngestorMetadata
.
The newIndexerMetadata
is nearly identical toIngestorMetadata
. Consider extracting common fields into a shared struct or trait to reduce duplication and simplify maintenance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
src/cli.rs
(2 hunks)src/enterprise/utils.rs
(2 hunks)src/handlers/http/cluster/mod.rs
(2 hunks)src/handlers/http/middleware.rs
(1 hunks)src/handlers/http/modal/ingest_server.rs
(3 hunks)src/handlers/http/modal/mod.rs
(3 hunks)src/lib.rs
(1 hunks)src/metrics/prom_utils.rs
(2 hunks)src/parseable/mod.rs
(4 hunks)src/storage/azure_blob.rs
(1 hunks)src/storage/localfs.rs
(1 hunks)src/storage/object_storage.rs
(1 hunks)src/storage/s3.rs
(4 hunks)src/utils/mod.rs
(1 hunks)src/utils/time.rs
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (5)
src/storage/azure_blob.rs (3)
src/storage/s3.rs (1) (1)
upload_multipart
(589:595)src/storage/object_storage.rs (1) (1)
upload_multipart
(93:97)src/storage/localfs.rs (1) (1)
upload_multipart
(107:113)
src/metrics/prom_utils.rs (1)
src/option.rs (1) (1)
url
(123:125)
src/enterprise/utils.rs (2)
src/storage/s3.rs (1) (1)
s
(181:181)src/parseable/streams.rs (1) (1)
parquet_files
(239:251)
src/cli.rs (1)
src/option.rs (1) (1)
mode
(127:135)
src/utils/mod.rs (1)
src/handlers/http/modal/mod.rs (1) (1)
get_indexer_id
(452:454)
🔇 Additional comments (22)
src/utils/time.rs (1)
52-52
: Good addition of Clone trait to TimeRangeAdding the
Clone
trait to theTimeRange
struct allows for more flexible usage patterns when you need to duplicate the time range data, which is likely needed for the new indexer functionality being added.src/utils/mod.rs (1)
45-50
: Good pattern reuse for indexer ID generationThe implementation follows the same pattern as
get_ingestor_id()
, which provides consistency in how different component identifiers are generated in the system.src/metrics/prom_utils.rs (2)
22-22
: Added Mode enum import for URL determinationThis import supports the URL generation changes based on the operation mode.
65-66
: URL generation now mode-specificThe change specifies that the URL should be generated for the ingest mode, which aligns with the enterprise updates for differentiating between ingestor and indexer endpoints.
The comment "for now it is only for ingestor" is helpful for documenting the current state, but consider whether metrics should eventually be collected for indexers as well.
src/handlers/http/middleware.rs (1)
362-367
:❓ Verification inconclusive
Security concern: All endpoints are now accessible in Index mode.
The previous endpoint restriction logic for Index mode has been removed. This means all API endpoints are now accessible when running in Index mode, which could potentially expose sensitive operations that were previously restricted.
Please verify whether this change is intentional and aligned with the security model for enterprise deployments. If some endpoints should still be restricted in Index mode, consider adding appropriate filtering logic.
🏁 Script executed:
#!/bin/bash # Check if there are any references to endpoint restrictions for Index mode elsewhere in the codebase grep -r "Mode::Index" --include="*.rs" .Length of output: 970
Attention: Verify Endpoint Filtering for Index Mode
It appears that the logic which previously restricted certain endpoints in Index mode has been removed from the middleware, leaving all endpoints accessible when running in this mode. The grep output shows that while
Mode::Index
is still referenced in several parts of the codebase (e.g., insrc/catalog/mod.rs
,src/main.rs
, and others), there’s no evidence of any updated filtering logic in the middleware (specifically around lines 362–367 insrc/handlers/http/middleware.rs
).
- Affected File:
src/handlers/http/middleware.rs
(lines 362–367)- Observation: The current implementation wraps the service call without any conditional checks or filtering based on the operating mode.
- Impact: When running in Index mode, all API endpoints become accessible, potentially exposing sensitive operations that were previously restricted.
Please verify if this removal is intentional for the enterprise deployment security model. If endpoint restrictions should still apply in Index mode, consider reintroducing appropriate filtering logic.
src/storage/object_storage.rs (1)
93-97
: New multipart upload interface added to ObjectStorage trait.The
upload_multipart
method adds support for multipart file uploads across different storage backends. This is a good addition for handling large files more efficiently. However, ensure that all implementations (S3, Azure, LocalFS) properly implement this method, as the LocalFS implementation currently usesunimplemented!()
.src/lib.rs (1)
75-79
:✅ Verification successful
HTTP client visibility and timeout changes.
Two changes have been made to the HTTP_CLIENT:
- The client is now publicly accessible (
pub static
) rather than private- The request timeout has been increased from 10 to 30 seconds
The increased timeout may help with handling larger requests or slower network conditions, which is beneficial for enterprise deployments. However, making the client public exposes implementation details that might be better encapsulated.
Verify whether making the HTTP_CLIENT public is necessary and used appropriately throughout the codebase:
🏁 Script executed:
#!/bin/bash # Find all places where HTTP_CLIENT is now used outside this module grep -r "HTTP_CLIENT" --include="*.rs" . | grep -v "src/lib.rs"Length of output: 1517
HTTP Client Visibility: Confirmed Usage Across the Codebase
The verification shows that
HTTP_CLIENT
is used extensively in modules outsidesrc/lib.rs
(e.g., insrc/analytics.rs
,src/audit.rs
,src/handlers/http/cluster/*
, and others). Given this widespread usage, making the HTTP client public appears to be a deliberate design decision. Additionally, increasing the request timeout from 10 to 30 seconds aligns well with handling larger requests or slower network conditions in enterprise deployments.
- Public Exposure Justified: Multiple modules rely on
HTTP_CLIENT
, so its public visibility is necessary.- Timeout Increase Acceptable: The raised timeout supports more resilient network conditions.
Overall, the changes are appropriate, and no further adjustments are required.
src/handlers/http/cluster/mod.rs (2)
54-54
: Consistent Import UsageBringing
IndexerMetadata
andIngestorMetadata
into scope here is straightforward and consistent with the existing structure.
60-61
: Maintain Naming ConsistencyDefining
IndexerMetadataArr
in parallel withIngestorMetadataArr
ensures consistent naming conventions for collection types. No issues here.src/cli.rs (1)
298-305
: Indexer Endpoint AddedIntroducing the
indexer_endpoint
field aligns with the existing style and expands configuration for indexing services. It's good that a default value is provided, though consider validating non-empty values if indexing is mandatory.src/handlers/http/modal/ingest_server.rs (3)
31-31
: Importing Mode EnumUsing
Mode
here is a natural extension if the ingest server needs mode-specific logic. Nothing concerning spotted.
112-112
: Storing Metadata with Explicit ModeCalling
store_metadata(Mode::Ingest)
is consistent with the broader shift towards mode-based metadata handling. Looks fine.
255-255
: Confirm Public Access tocheck_querier_state
Changing visibility to
pub
makes this function callable from other modules. Verify that external callers cannot misuse this to bypass any internal workflows, especially around system readiness or security checks.src/enterprise/utils.rs (1)
3-3
: Chrono Import for Date HandlingBringing in
chrono::{TimeZone, Utc}
is appropriate for robust date/time operations. No immediate issues here.src/parseable/mod.rs (3)
50-50
: No issues found for the new imports.
The import statement forIndexerMetadata
seems correct and consistent with the existing structure.
149-152
: Check whetherMode::All
also needs indexer metadata.
The logic only loads metadata forMode::Index
. Please verify if running inMode::All
should also initializeindexer_metadata
.
158-158
: No concerns with storing the new field.
Passingindexer_metadata
to the struct constructor looks straightforward and is consistent with the existing pattern for ingestor metadata.src/storage/s3.rs (3)
46-46
: Import statement needed for async file I/O.
Usingtokio
’sOpenOptions
andAsyncReadExt
is appropriate for streaming file reads.
66-66
: Defines the minimum part size for multipart uploads.
This constant (5MB) aligns with AWS S3’s minimum valid chunk size for multipart operations.
589-595
: Public wrapper for_upload_multipart
.
This method succinctly exposes the multipart upload functionality. No concerns identified here.src/handlers/http/modal/mod.rs (2)
37-37
: ImportingMode
for indexing logic.
Referencingoption::Mode
is consistent with the rest of the file’s approach to handling server modes.
40-40
: New utility imports for ID retrieval.
Usingget_indexer_id
andget_ingestor_id
will help differentiate between these roles. No issues here.
async fn upload_multipart( | ||
&self, | ||
_key: &RelativePath, | ||
_path: &Path, | ||
) -> Result<(), ObjectStorageError> { | ||
unimplemented!() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation needed for upload_multipart
The method is currently stubbed with unimplemented!()
, which is appropriate for a work-in-progress PR. However, this should be implemented before the feature is considered complete since other storage backends like S3 have working implementations.
Consider implementing the method using the commented-out code around line 382 as a starting point, or at minimum add a TODO comment with a timeline for implementation:
async fn upload_multipart(
&self,
_key: &RelativePath,
_path: &Path,
) -> Result<(), ObjectStorageError> {
- unimplemented!()
+ // TODO(enterprise): Implement multipart uploads for Azure Blob storage in ticket #XXXX
+ Err(ObjectStorageError::UnhandledError(Box::new(
+ std::io::Error::new(
+ std::io::ErrorKind::Unsupported,
+ "Multipart upload not implemented for Blob Storage yet",
+ ),
+ )))
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async fn upload_multipart( | |
&self, | |
_key: &RelativePath, | |
_path: &Path, | |
) -> Result<(), ObjectStorageError> { | |
unimplemented!() | |
} | |
async fn upload_multipart( | |
&self, | |
_key: &RelativePath, | |
_path: &Path, | |
) -> Result<(), ObjectStorageError> { | |
// TODO(enterprise): Implement multipart uploads for Azure Blob storage in ticket #XXXX | |
Err(ObjectStorageError::UnhandledError(Box::new( | |
std::io::Error::new( | |
std::io::ErrorKind::Unsupported, | |
"Multipart upload not implemented for Blob Storage yet", | |
), | |
))) | |
} |
async fn upload_multipart( | ||
&self, | ||
_key: &RelativePath, | ||
_path: &Path, | ||
) -> Result<(), ObjectStorageError> { | ||
unimplemented!() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation needed for upload_multipart method.
The upload_multipart
method is currently unimplemented. This could cause runtime panics if this method is called in production. Consider implementing this method with actual functionality similar to the other storage backends, or at minimum, returning a proper error instead of using unimplemented!()
.
async fn upload_multipart(
&self,
_key: &RelativePath,
_path: &Path,
) -> Result<(), ObjectStorageError> {
- unimplemented!()
+ Err(ObjectStorageError::UnhandledError(Box::new(
+ std::io::Error::new(
+ std::io::ErrorKind::Unsupported,
+ "Multipart upload not implemented for LocalFS yet",
+ ),
+ )))
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async fn upload_multipart( | |
&self, | |
_key: &RelativePath, | |
_path: &Path, | |
) -> Result<(), ObjectStorageError> { | |
unimplemented!() | |
} | |
async fn upload_multipart( | |
&self, | |
_key: &RelativePath, | |
_path: &Path, | |
) -> Result<(), ObjectStorageError> { | |
Err(ObjectStorageError::UnhandledError(Box::new( | |
std::io::Error::new( | |
std::io::ErrorKind::Unsupported, | |
"Multipart upload not implemented for LocalFS yet", | |
), | |
))) | |
} |
pub async fn get_indexer_info() -> anyhow::Result<IndexerMetadataArr> { | ||
let store = PARSEABLE.storage.get_object_store(); | ||
|
||
let root_path = RelativePathBuf::from(PARSEABLE_ROOT_DIRECTORY); | ||
let arr = store | ||
.get_objects( | ||
Some(&root_path), | ||
Box::new(|file_name| file_name.starts_with("indexer")), | ||
) | ||
.await? | ||
.iter() | ||
// this unwrap will most definateley shoot me in the foot later | ||
.map(|x| serde_json::from_slice::<IndexerMetadata>(x).unwrap_or_default()) | ||
.collect_vec(); | ||
|
||
Ok(arr) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve Error Handling in Metadata Deserialization
Using unwrap_or_default()
can silently swallow parsing errors and may hinder debugging if the metadata is malformed. Prefer propagating errors or logging them for better visibility.
Consider changing to:
.map(|x| {
- serde_json::from_slice::<IndexerMetadata>(x).unwrap_or_default()
+ serde_json::from_slice::<IndexerMetadata>(x).unwrap_or_else(|e| {
+ error!("Failed to parse indexer metadata: {:?}", e);
+ IndexerMetadata::default()
+ })
})
to detect and log failures, or fully propagate the error if data integrity is critical.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pub async fn get_indexer_info() -> anyhow::Result<IndexerMetadataArr> { | |
let store = PARSEABLE.storage.get_object_store(); | |
let root_path = RelativePathBuf::from(PARSEABLE_ROOT_DIRECTORY); | |
let arr = store | |
.get_objects( | |
Some(&root_path), | |
Box::new(|file_name| file_name.starts_with("indexer")), | |
) | |
.await? | |
.iter() | |
// this unwrap will most definateley shoot me in the foot later | |
.map(|x| serde_json::from_slice::<IndexerMetadata>(x).unwrap_or_default()) | |
.collect_vec(); | |
Ok(arr) | |
} | |
pub async fn get_indexer_info() -> anyhow::Result<IndexerMetadataArr> { | |
let store = PARSEABLE.storage.get_object_store(); | |
let root_path = RelativePathBuf::from(PARSEABLE_ROOT_DIRECTORY); | |
let arr = store | |
.get_objects( | |
Some(&root_path), | |
Box::new(|file_name| file_name.starts_with("indexer")), | |
) | |
.await? | |
.iter() | |
// this unwrap will most definateley shoot me in the foot later | |
.map(|x| { | |
serde_json::from_slice::<IndexerMetadata>(x).unwrap_or_else(|e| { | |
error!("Failed to parse indexer metadata: {:?}", e); | |
IndexerMetadata::default() | |
}) | |
}) | |
.collect_vec(); | |
Ok(arr) | |
} |
.filter_map(|file| { | ||
let date = file.file_path.split("/").collect_vec(); | ||
|
||
let date = date.as_slice()[1..4].iter().map(|s| s.to_string()); | ||
|
||
let date = RelativePathBuf::from_iter(date); | ||
|
||
parquet_files.entry(date).or_default().push(file); | ||
let year = &date[1][5..9]; | ||
let month = &date[1][10..12]; | ||
let day = &date[1][13..15]; | ||
let hour = &date[2][5..7]; | ||
let min = &date[3][7..9]; | ||
let file_date = Utc | ||
.with_ymd_and_hms( | ||
year.parse::<i32>().unwrap(), | ||
month.parse::<u32>().unwrap(), | ||
day.parse::<u32>().unwrap(), | ||
hour.parse::<u32>().unwrap(), | ||
min.parse::<u32>().unwrap(), | ||
0, | ||
) | ||
.unwrap(); | ||
|
||
if file_date < time_range.start { | ||
None | ||
} else { | ||
let date = date.as_slice()[1..4].iter().map(|s| s.to_string()); | ||
|
||
let date = RelativePathBuf::from_iter(date); | ||
|
||
parquet_files.entry(date).or_default().push(file); | ||
Some("") | ||
} | ||
}) | ||
.for_each(|_| {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Validate Path String Sub-slices
Extracting substrings like &date[1][5..9]
risks panics if date[1]
is shorter than expected. Consider verifying path segment lengths to guard against malformed or unexpected file paths.
- let year = &date[1][5..9];
+ if date[1].len() < 9 {
+ warn!("Unexpected file path format for: {:?}", date);
+ return None;
+ }
+ let year = &date[1][5..9];
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
.filter_map(|file| { | |
let date = file.file_path.split("/").collect_vec(); | |
let date = date.as_slice()[1..4].iter().map(|s| s.to_string()); | |
let date = RelativePathBuf::from_iter(date); | |
parquet_files.entry(date).or_default().push(file); | |
let year = &date[1][5..9]; | |
let month = &date[1][10..12]; | |
let day = &date[1][13..15]; | |
let hour = &date[2][5..7]; | |
let min = &date[3][7..9]; | |
let file_date = Utc | |
.with_ymd_and_hms( | |
year.parse::<i32>().unwrap(), | |
month.parse::<u32>().unwrap(), | |
day.parse::<u32>().unwrap(), | |
hour.parse::<u32>().unwrap(), | |
min.parse::<u32>().unwrap(), | |
0, | |
) | |
.unwrap(); | |
if file_date < time_range.start { | |
None | |
} else { | |
let date = date.as_slice()[1..4].iter().map(|s| s.to_string()); | |
let date = RelativePathBuf::from_iter(date); | |
parquet_files.entry(date).or_default().push(file); | |
Some("") | |
} | |
}) | |
.for_each(|_| {}); | |
.filter_map(|file| { | |
let date = file.file_path.split("/").collect_vec(); | |
if date[1].len() < 9 { | |
warn!("Unexpected file path format for: {:?}", date); | |
return None; | |
} | |
let year = &date[1][5..9]; | |
let month = &date[1][10..12]; | |
let day = &date[1][13..15]; | |
let hour = &date[2][5..7]; | |
let min = &date[3][7..9]; | |
let file_date = Utc | |
.with_ymd_and_hms( | |
year.parse::<i32>().unwrap(), | |
month.parse::<u32>().unwrap(), | |
day.parse::<u32>().unwrap(), | |
hour.parse::<u32>().unwrap(), | |
min.parse::<u32>().unwrap(), | |
0, | |
) | |
.unwrap(); | |
if file_date < time_range.start { | |
None | |
} else { | |
let date = date.as_slice()[1..4].iter().map(|s| s.to_string()); | |
let date = RelativePathBuf::from_iter(date); | |
parquet_files.entry(date).or_default().push(file); | |
Some("") | |
} | |
}) | |
.for_each(|_| {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/storage/object_storage.rs (2)
93-97
: Add documentation for the new trait method.
Although this new trait methodupload_multipart
is a valuable addition, it would be beneficial to include a doc comment explaining usage details and any assumptions made about file sizes or concurrency constraints.
847-851
: Ensure robust error handling for partial uploads.
While it is good that the loop continues upon upload errors, consider whether you want to provide any retry logic for partial file uploads. In high-latency or failure scenarios, having granular retries for each chunk could ensure more resilient uploads.src/storage/s3.rs (2)
66-66
: Consider making the threshold configurable.
Defining a 5 MB threshold forMIN_MULTIPART_UPLOAD_SIZE
is reasonable, but it might be even more robust to allow a user or environment variable to configure this value for edge cases or variable bandwidth constraints.
514-565
: Check concurrency and finalization logic in_upload_multipart
.
This implementation executes part-uploads in parallel withtokio::spawn
, which can improve speed but may also raise memory usage for very large files. Examine whether a bounded concurrency strategy or streaming approach is more suitable. Additionally, you may want to handle failures inasync_writer.complete()
by aborting the multipart upload to avoid leaving stale partials.Do you want a verification script to scan for any usage of
abort_multipart
calls or relevant error handling in other files that might be triggered upon failure?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/storage/object_storage.rs
(2 hunks)src/storage/s3.rs
(4 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/storage/s3.rs (3)
src/storage/mod.rs (1) (1)
to_object_store_path
(266:268)src/storage/object_storage.rs (1) (1)
upload_multipart
(93:97)src/storage/azure_blob.rs (1) (1)
upload_multipart
(427:433)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (2)
src/storage/s3.rs (2)
46-46
: Necessary import for asynchronous file I/O.
This import ofOpenOptions
andAsyncReadExt
is required to support non-blocking reads, which is essential for multipart uploads.
581-587
: Minimal pass-through implementation.
Forwardingupload_multipart
to the private_upload_multipart
method looks consistent. No further concerns.
cd4f854
to
7e2cbb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/storage/object_storage.rs (2)
93-97
: Add a short docstring to the new method for clarity.
It would be helpful to document the behavior, use cases, and expected error scenarios ofupload_multipart
, so future contributors can easily understand its role and how it integrates with the rest of the trait methods.
847-851
: Consider parallel uploading and improved error handling.
Currently, the upload is performed sequentially within this loop, and if an error persists, the code logs it and proceeds. For large-scale deployments, consider performing parallel uploads for multiple files to reduce overall time. Also, evaluate whether a retry or rollback mechanism is necessary to handle partial failures in multipart uploads.src/storage/s3.rs (1)
65-65
: Make the minimum multipart upload size configurable if future needs demand it.
The current 5 MB threshold is fairly standard. However, if you anticipate different file sizes or have memory constraints, consider making it user-configurable for flexibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/storage/object_storage.rs
(2 hunks)src/storage/s3.rs
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
🔇 Additional comments (2)
src/storage/s3.rs (2)
46-46
: Import statement looks good.
The addition ofOpenOptions
andAsyncReadExt
is appropriate for the asynchronous file reads.
578-584
: Straightforward delegation to_upload_multipart
.
This public method is cleanly forwarding the call to the private_upload_multipart
function. The implementation is consistent with the trait requirement.
7e2cbb9
to
29597f5
Compare
Adds multiple updates for Parseable Enterprise
Description
This PR has:
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
TimeRange
struct to support cloning.