-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-45394: [C++] Handle Single-Line JSON Without Line Ending #45443
Open
JOBIN-SABU
wants to merge
8
commits into
apache:main
Choose a base branch
from
JOBIN-SABU:fix-45394
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+55
−9
Open
Changes from 1 commit
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
6cd73d5
Fix issue #45394: Handle single-line JSON without line ending
JOBIN-SABU f9b5850
Update file_json.cc
JOBIN-SABU 665d39c
Delete test.json
JOBIN-SABU 8da0823
Merge branch 'fix-45394' into my-correction-branch
JOBIN-SABU 68f32c3
Merge pull request #1 from JOBIN-SABU/my-correction-branch
JOBIN-SABU 2c0ca04
Update CMakeLists.txt
JOBIN-SABU bafd6b7
Partial fix for issue #45394
JOBIN-SABU 8acf213
Merge branch 'fix-45394' of https://github.com/JOBIN-SABU/arrow into …
JOBIN-SABU File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Add test source files | ||
set(TEST_SOURCES | ||
test_file_json.cc | ||
) | ||
|
||
# Create test executable | ||
add_executable(ArrowJsonTests ${TEST_SOURCES}) | ||
|
||
# Link Google Test and your project libraries | ||
target_link_libraries(ArrowJsonTests gtest gtest_main arrow_dataset arrow_io) | ||
|
||
# Add tests | ||
add_test(NAME ArrowJsonTests COMMAND ArrowJsonTests) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
#include <gtest/gtest.h> | ||
#include "arrow/dataset/file_json.h" | ||
#include "arrow/io/memory.h" | ||
#include "arrow/status.h" | ||
#include "arrow/testing/gtest_util.h" | ||
|
||
using namespace arrow; | ||
using namespace arrow::dataset; | ||
|
||
class JsonFragmentScannerTest : public ::testing::Test { | ||
protected: | ||
void SetUp() override { | ||
// Set up necessary objects and state for the tests | ||
} | ||
|
||
void TearDown() override { | ||
// Clean up after tests | ||
} | ||
}; | ||
|
||
TEST_F(JsonFragmentScannerTest, InvalidBlockSize) { | ||
FragmentScanRequest scan_request; | ||
JsonFragmentScanOptions format_options; | ||
JsonInspectedFragment inspected; | ||
Executor* cpu_executor = nullptr; | ||
|
||
format_options.read_options.block_size = -1; // Invalid block size | ||
|
||
auto result = JsonFragmentScanner::Make(scan_request, format_options, inspected, cpu_executor); | ||
ASSERT_FALSE(result.ok()); | ||
ASSERT_EQ(result.status().code(), StatusCode::Invalid); | ||
ASSERT_EQ(result.status().message(), "Block size must be positive"); | ||
} | ||
|
||
TEST_F(JsonFragmentScannerTest, ValidBlockSize) { | ||
FragmentScanRequest scan_request; | ||
JsonFragmentScanOptions format_options; | ||
JsonInspectedFragment inspected; | ||
Executor* cpu_executor = nullptr; | ||
|
||
format_options.read_options.block_size = 1024; // Valid block size | ||
inspected.num_bytes = 2048; | ||
|
||
auto result = JsonFragmentScanner::Make(scan_request, format_options, inspected, cpu_executor); | ||
ASSERT_TRUE(result.ok()); | ||
} | ||
|
||
TEST_F(JsonFragmentScannerTest, SingleLineJson) { | ||
FragmentScanRequest scan_request; | ||
JsonFragmentScanOptions format_options; | ||
JsonInspectedFragment inspected; | ||
Executor* cpu_executor = nullptr; | ||
|
||
format_options.read_options.block_size = 1024; | ||
inspected.num_bytes = 1024; | ||
|
||
// Create a single-line JSON input stream | ||
std::string json_content = R"({"key": "value"})"; | ||
inspected.stream = std::make_shared<arrow::io::BufferReader>(json_content); | ||
|
||
auto result = JsonFragmentScanner::Make(scan_request, format_options, inspected, cpu_executor); | ||
ASSERT_TRUE(result.ok()); | ||
} | ||
|
||
TEST_F(JsonFragmentScannerTest, MultiLineJson) { | ||
FragmentScanRequest scan_request; | ||
JsonFragmentScanOptions format_options; | ||
JsonInspectedFragment inspected; | ||
Executor* cpu_executor = nullptr; | ||
|
||
format_options.read_options.block_size = 1024; | ||
inspected.num_bytes = 2048; | ||
|
||
// Create a multi-line JSON input stream | ||
std::string json_content = R"({"key1": "value1"} | ||
{"key2": "value2"})"; | ||
inspected.stream = std::make_shared<arrow::io::BufferReader>(json_content); | ||
|
||
auto result = JsonFragmentScanner::Make(scan_request, format_options, inspected, cpu_executor); | ||
ASSERT_TRUE(result.ok()); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"field": 1} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason you're not using the existing
files_json_test.cc
file and infrastructure and you're adding all this new CMake?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/arrow/blob/16c7f1a0bbcfad20eba7c63bc86d3da784d1db34/cpp/src/arrow/dataset/file_json_test.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @raulcd,
I sincerely apologize for overlooking the existing file_json_test.cc file and the associated infrastructure. I should have checked and integrated my changes into the existing test framework instead of introducing new CMake configurations.
I’ll make the necessary corrections to align with the existing structure and update the PR accordingly. Thanks for pointing this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries at all! Thanks for your contributions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @raulcd a
I hope you're doing well.
I've made progress on fixing issue #45394, but I've encountered some challenges with running the tests due to hardware and internet limitations. While I believe I've addressed the core of the problem, the tests are still not passing, and I think this might be an upstream issue.
I've pushed my changes to the fix-45394 branch. Could you please take it from here and help with running the tests and finalizing any additional changes?
I appreciate your assistance and look forward to your feedback.
Thank you!