-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
printf: accept non-UTF-8 input in FORMAT and #6812
base: main
Are you sure you want to change the base?
Conversation
ARGUMENT arguments Other implementations of `printf` permit arbitrary data to be passed to `printf`. The only restriction is that a null byte terminates FORMAT and ARGUMENT argument strings (since they are C strings). The current implementation only accepts FORMAT and ARGUMENT arguments that are valid UTF-8 (this is being enforced by clap). This commit removes the UTF-8 validation by switching to OsStr and OsString. This allows users to use `printf` to transmit or reformat null-safe but not UTF-8-safe data, such as text encoded in an 8-bit text encoding. See the `non_utf_8_input` test for an example (ISO-8859-1 text).
Probably needs some refinement. If merged, will resolve #6804 (the |
GNU testsuite comparison:
|
@@ -126,3 +141,50 @@ fn extract_value<T: Default>(p: Result<T, ParseError<'_, T>>, input: &str) -> T | |||
} | |||
} | |||
} | |||
|
|||
pub fn bytes_from_os_str(input: &OsStr) -> UResult<&[u8]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add a unit test for this function?
and get_str_or_exit_with_error if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit I just pushed cleans up the error handling. I renamed this function to try_get_bytes_from_os_str
. I'm not sure how to test this, because on most platforms this operation never fails. I did add a test for this much easier to trigger error:
0f3bc3f#diff-6ea56f426af08b614e48fe26ab7345fd0f2901e87b6b77d5ccf37e4eff6dd3d1
❯ ./target/release/printf '%d' "$(coreutils printf 'Swer an rehte g\xFCete')"
./target/release/printf: invalid (non-UTF-8) argument like 'Swer an rehte g�ete' encountered
0f3bc3f
to
d592d16
Compare
GNU testsuite comparison:
|
GNU testsuite comparison:
|
I am working on the handling of non-UTF-8 inputs in See the related commit I think we should organize ourselves to not duplicate code. |
If you think your PR will be merged soonish, I will convert this PR to a draft, wait for your PR to be merged, and then update this branch to use those conversion functions. |
It was just merged 👍 |
6a041b9
to
a65a474
Compare
@RenjiSann I've included some changes to the code you just added. Please let me know if you have any issues with the changes. |
I'm good with all of it. The precaution regarding the potentially overlapping |
GNU testsuite comparison:
|
a65a474
to
a7ec92c
Compare
GNU testsuite comparison:
|
A few lines are not covered by unit tests (low code coverage) |
ARGUMENT arguments
Other implementations of
printf
permit arbitrary data to be passed toprintf
. The only restriction is that a null byte terminates FORMAT and ARGUMENT argument strings (since they are C strings).The current implementation only accepts FORMAT and ARGUMENT arguments that are valid UTF-8 (this is being enforced by clap).
This commit removes the UTF-8 validation by switching to OsStr and OsString.
This allows users to use
printf
to transmit or reformat null-safe but not UTF-8-safe data, such as text encoded in an 8-bit text encoding. See thenon_utf_8_input
test for an example (ISO-8859-1 text).