Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added BOM sniffing check to file load #565

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ async = ["async-trait"]
lazy_static = "1.4"
serde = "1.0"
nom = "7"
encoding_rs_io = "0.1"

async-trait = { version = "0.1", optional = true }
toml = { version = "0.8", optional = true }
Expand Down
15 changes: 12 additions & 3 deletions src/file/source/file.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
use std::env;
use std::error::Error;
use std::fs;
use std::io;
use std::io::Read;
use std::path::PathBuf;

use encoding_rs_io::DecodeReaderBytesBuilder;

use crate::file::{
format::ALL_EXTENSIONS, source::FileSourceResult, FileSource, FileStoredFormat, Format,
};
Expand Down Expand Up @@ -114,8 +116,15 @@ where
.and_then(|base| pathdiff::diff_paths(&filename, base))
.unwrap_or_else(|| filename.clone());

// Read contents from file
let text = fs::read_to_string(filename)?;
let mut text = String::new();

DecodeReaderBytesBuilder::new()
// On Windows, we need to check for added Byte Order Mark (BOM). To
// do this, we don't specify an encoding, and enable BOM sniffing.
.encoding(None)
.bom_sniffing(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this should have tests. We ask that the PR be split into two commits

  1. Add the tests showing the bad behavior (ie they pass)
  2. Change behavior and update the tests to reflect that change in behavior

In this case, I would want to have the first commit have a test that is parsing content with a BOM and to show the failure that happens. The follow up commit should then strip the BOM and update the test to not error anymore.

.build(std::fs::File::open(filename)?)
.read_to_string(&mut text)?;
Comment on lines +121 to +127
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I'm reading the docs correctly, this does more than BOM sniffing / stripping but also handles other encodings. That should be a separate question, discussed in an issue first. This also then feels like a heavy weight dependency just for handling BOM.


Ok(FileSourceResult {
uri: Some(uri.to_string_lossy().into_owned()),
Expand Down