-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.4 with rewritten parsers + python bindings #45
Conversation
cc @bovee in case you haven't seen that |
This is really cool! My only nit is I think your "random TSV" example is actually from a SAM file header, but SAM files are technically TSVs so that's probably okay. :) The new streaming iterator syntax is neat; I've been playing with building toy parsers with similar style so it's really nice to see needletail has that. Super excited for new stable maturin too; didn't know they were finally nailing that down. |
I just tried out this branch (see this commit), and was a bit sad to not see a
The second commit is pretty aggressive (sorry!) with all the code replacing, but I think the first one is less invasive. I can open a PR against this one if you want. |
oh, and I really liked the new iterator style! It makes working with errors a breeze, I used to have a lot of |
I like the parse_fastx_reader a lot, I forgot to implement it again but it was on my TODO list in the back of my brain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor typos. The body of this I didn't closely review, however, and would be better with somebody with more Rust experience's eyes...
@Keats One question, has our support for streaming from stdin
changed in any measurable way with the port to the seqio
parsing model? Just one thing we should be sure is stable that may not have the same test coverage.
@@ -8,61 +8,53 @@ Needletail is a MIT-licensed, minimal-copying FASTA/FASTQ parser and _k_-mer pro | |||
The goal is to write a fast *and* well-tested set of functions that more specialized bioinformatics programs can use. | |||
Needletail's goal is to be as fast as the [readfq](https://github.com/lh3/readfq) C library at parsing FASTX files and much (i.e. 25 times) faster than equivalent Python implementations at _k_-mer counting. | |||
|
|||
# Example | |||
## Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? Looks a bit like the whole block got commented out in an editor. (Referring not just to the header but to the rust
code block below too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The # part is intentional, the commented out code not so much...
|
||
## Acknowledgements | ||
Starting from 0.4, the parsers algorithms is taken from [seq_io](https://github.com/markschl/seq_io). While it has been slightly modified, it is mainly | ||
coming that library. Links to the original files are available in `src/parser/fast{a,q}.rs`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "coming from that library".
Fair point about controlling your deps. But I also tested changing that |
I'm ok with niffler if it can be changed to MultiGzDecoder, I didn't know if that was ok to change or not as it is pretty niche. |
It's niche, but it works fine for regular |
niffler 2.1.1 released, with |
Cheers, can you make a PR with your changes? |
…#46) * Use cursor and Box to avoid Seek bound * use niffler * fix feature, bring back stdin support, remove unwrap * use niffler 2.1.1, with MultiGzDecoder as default * use niffler 2.2.0, and add stdin tests
@bovee do you have a new email I can put in the Cargo.toml? |
First initial, last name at gmail. Maybe your email should be in the Cargo.toml instead though since you've fixed up most of the code? ;) |
I've done both! |
@luizirber looks like pyo3 0.11 is requiring |
Closes DEV-3520, closes DEV-3519, closes DEV-3521.