-
-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Rewrite PipeStream and SshCommand stream handling #144
Conversation
InputStream is now implemented. It is now possible to fully control the process execution and implement things like git+ssh or rsync+ssh on top of SSH.NET. |
Great contributions. |
I didn't see any reason to buffer data in the pipe as that will only increase memory use without improving throughput so my PipeStream implementation keeps only one window of data and expects someone to read it and writing will block until it has been completely read out. The absolute best way to handle the incoming data from SSH would be to (optionally) throw out the streams and use the events instead as that would completely eliminate read blocking and unnecessary threading/locking that's caused by it. I'm not 100% sure what the expected behavior of PipeStream is as the non-blocking exception thing was there that I removed and there's a bit of confusing going on with my commits regarding disposing the streams and letting the last buffer to be read out. Flushing was completely removed for instance as I didn't see the need for it in the simple version of PipeStream but it might not be the right move. Regarding performance I have some doubts a linked list would be any faster as it would increase the amount of work to be done if there's a reader ready to empty it at all times (like A linked list would also change the behavior from (my) blocking back to non-blocking and it would allow the writer to exhaust the memory if the reader is slow for any reason where usually the speed of transfer is determined by the complete blocking chain. The original PipeStream implementation resulted in around 50-55% CPU use where my hanged around 20-30%. Throughput during my test was a constant 1Gbit/s with both and OpenSSH hitting local network limit. |
I think we can agree that the original implementation left "some" room for improvement. |
Streams are supposed to be blocking (right?) so the default being blocking would be something I'd expect. The Making it all not block would be best done by having events as it removes the pipes completely and if no one is listening then the data is simply just lost. Blocking the writer when not using events looks like the right call to me. The big issue is handling reading both stdout and stderr simultaneously. They can both become large if buffered in-memory and no one is reading them. The simple solution for the user of the api would be to dispose the extended stream immediately to prevent it from blocking if the process writes to stderr and the user does not care about it. In summary, changing the api of SshCommand and/or adding another command class for blocking use might be worth a shot to support different use cases. I'm a little bit annoyed when people use the TTY method to run simple commands (with Expect kludge) when SshCommand is usually the right thing to do when you're not running anything interactive so getting it user friendly and performant at the same time is something I have a lot of interest in. |
I thought about the blocking problem more and I think a simple works-for-everyone solution would be to make SshCommand export the streams as PipeStream instead of plain Stream and have a property for switching between blocking and non-blocking (buffering) mode. Depending on your performance testing, if a linked list is not significantly slower than the raw byte[] thing I have when there's a reader always available, switching between blocking and non-blocking mode would just add a lock to writer when there's something in the list or alternatively it would switch between a list and a single byte[] buffer if there's a major performance difference. I can see this benefiting my use case as well as then I could keep OutputStream blocking but let ExtendedOutputStream buffer everything in-memory when I'm not expecting there to be much. It would also make a nice default if all streams buffer as you want them but the property be exposed to switch them on-demand. What do you think? |
I'm still incliding to go for the buffered (non-blocking write, blocking read) approach. |
You can execute 'cat' on the remote system and then loopback everything through it from InputStream to OutputStream. I used some random binary data (files) and hashed them after piping through and measured the CPU time while it was doing that. |
https://gist.github.com/hifi/9213b8981a938be7fd2dda4790a40dfb an alternative PipeStream that queues everything. It seems to be as heavy as the original implementation (on Mono) but the code is a bit cleaner. Reader is blocking but writer isn't. |
More testing reveals that the buffering version I gisted does not have almost any performance impact compared to the blocking one I have in this pull request making it the clear winner supporting both workflows. On Mono, however, the buffering implementation is CPU intensive but that's something that needs to be tackled on that side. |
Here are some actual benchmarks of the two implementations on three different runtimes: https://github.com/hifi/PipeBench |
Let me start by saying: thanks for all the effort you've put into this (getting nothing but semi-negative feedback from me). I don't have much time to spend on this right now (got two branches called Sofie and Lisa that require attention). Usage of BlockingCollection is a no-go since it's only available as of .NET 4+. Feel free to disregard these "negative" remarks until I have time to work with you on this. |
Technically BlockingCollection is exactly what you needed. It allows the writer to write without blocking (unless a reader blocks it). I'll see if I can find an alternative solution. There is no buffer size limitation in my BlockingCollection approach as it copies the write buffers as they are to the collection. I can replace BlockingCollection with a home grown solution to get better backwards compatiblity. It might also help with the performance on Mono. Personal branches always before GitHub. Take care! |
Added an implementation of BufferingPipeStream to the benchmark that does its own locking around Queue. It is the fastest implementation of them all on Microsoft runtimes. |
Apart from both memory and performance optimizations, one of the most important changes in the new implementation is that Before we can merge this new implementation, we should:
|
@hifi What do you think about this: should we use PipeStream.Dispose() to mark the stream as EOF, or introduce a separate (internal) method for this purpose? |
@drieseng The |
@drieseng Would an empty write do it? A write with zero count would queue normally but when the reader hits that empty buffer it would also return and not dequeue that from the linked list anymore? |
Actually I had a discussion about Streams on IRC and my eyes were opened that on C# a Stream instance should only do one direction at a time. A pipe should consist of two streams where closing the writer stream also indicates EOF. We should do that instead. |
Yeah, we could go for a PipeInputStream/PipeOutputStream approach. |
https://gist.github.com/hifi/3882f07f9064ce87e1a43d27a83888b3 I haven't even run this once and it might be overkill and I might have refactored too much but in this you create a Tried to share as much code as possible between the streams. |
I think I'd prefer to split PipeStream into PipeInputStream and PipeOutputStream, and maybe have PipeInputStream "read" from PipeOutputStream (not using the |
Want me to propose something? I think I'll have time tomorrow or Friday. |
You got the streams mixed but sure, propose something so we have things to compare. One design should stick out to be the cleanest and easiest to understand. I thought about having the pipe split into two streams but the "ownership" of the data was something that seemed odd to me if one of them had some sort of authority. That's why I used a nested class to hold the state/data and shared that between the two pipes. They could be split into two different classes instead of having a single class that does one of them depending on how it's initialized. I'll keep experimenting with the design as well. |
I don't think I mixed the streams, but feel free to correct me. I won't have time to work on this today :( |
So it was me who got them mixed up. I'm looking into this today to get those separated and properly named. |
Since SshCommand is essentially remote System.Diagnostics.Process, why wouldn't we mimic the API as much as we can as it's something people are already used to and it provides all the required things? Edit: What I mean is that we could instead deprecate SshCommand itself and create SshProcess that's API compatible with Process. This wouldn't break existing use for the next release but provide something you should transition to. |
Will it be possible to emulate the following command with this new functionality?
This capability is very desired in our project. For now we are planning to implement it ourselves, but it will be great to see that in an official build of SSH.NET. |
That is one possible use case for this PR with stdin support. |
Sure, PipeBench needs a little bit of cleaning to make it compatible with the double stream approach which should be the one we aim for. |
@hifi Could you have a look at performance changes I committed for SFTP download? More specifically, please have a look at SftpFileReader and tell me what you would change or how you would've implemented it. There's still quite some stuff left to do in that class. Mostly around exceptional handling, but more importantly around cancellation (e.g. how to cancel a blocking wait for a semaphore without having a CancelleationToken?). I've been thinking about having our proper implementation of CancellationToken that we can use when we're targeting frameworks that don't provide this, but I'm sure there are simpler solutions ... |
Out of curiosity, what needs to be done to get this PR completed? I love the idea of exposing a |
I think we kind of abandoned the process idea, and were discussing the design of the PipeStream. I first want to release 2016.1.0-beta2 (about time, it seems) before I continue working on this issue, but I'd gladly welcome you to join the discussion (and - if possible - help with the implementation). |
Did we abandon the process idea? I just proposed it a few messages up. It did intrigue me to make it look like a remote I haven't been active around this project for a few weeks now but I'll get back to it. |
"Abondon" was perhaps a bit too strong worded. I though we discussed it, and decided to at least postpone it (and instead first focus on the PipeStream design and implementation). But I can't find any trace of it, so I'm probably mistaken. My bad. Personally, I'd still prefer to move forward like this:
I'd happily discuss and design the remote process concept (even if the new PipeStream implementation does not introduce too many breaking change), but I know my time is limited (even though I spend already way too much time on SSH.NET). If you guys have enough time available, then please create a new issue for this., I'll participate in the discussions on a best-effort basis. |
All the above seems to be quite demanding (timewise).
(Sorry, new here, but just wasted quite some time figuring out why OutputStream does never return anything :-) ) |
I kind of dropped the ball regarding this PR and went into other things. I'd still like to get it going but need to find the motivation again. Regardless the performance is excellent and could even surpass SFTP reading the performance issues it had. |
I'd also like this and other PRs to get going again, but I have too little time and would like more interaction/discussion with other devs (eg. on issue #194). |
I would not be able to help on those issue, but I will try to do two things (since I need them anyway):
I can make the changes available. |
All, any word on this PR? A working OutputStream for the SshCommand would be really nice. Right now I've hacked together some code wrapping the OutputStream to force necessary blocking for a temporary workaround on my end, but if this PR is too large of a scope to get out in the near future, would you consider putting something out just to address this smaller chunk of needed (and easily addressed) functionality. |
Any progress on this PR? Our project has a desire that mimic's 21region's example code:
|
I think the easiest way is to create something that has a similar API to Unit testing against said I haven't made any progress towards this yet so it's open to be picked up as long as there's a pre-accepted solution so no one would be doing more non-mergeable work than I already did. |
I don't know if this is going to work, but I'm cornered enough that I'm now going to attempt the following:
Wish me luck 😄 |
Hi Looking for the same. |
Any progress on this would be most appreciated. Just recently, i was trying to utilize blocking reads with SSH.NET's output I am not sure if it would make sense to file an issue reportwith regard to my issue (under the condition that i cannot find an equivalent issue report, of course), considering that the issue would probably disappear (or in the worst case morph into a different issue) if the changes discussed here would be implemented in SSH.NET... |
When it will be available? I would like to send disk image datas from C# to a stdin pipe of dd over SSH, like in this example. I would like to do the following same thing which works in Linux. # dd bs=16M if=/dev/sda | ssh root@serverB "dd bs=16M of=/dev/sda" |
There is a fork that has this feature |
|
string
stream completely
can completely block the execution as it will hang the event
handler indefinitely if no one is reading the other end
completely emptied it
This has not been tested with anything else than SshCommand in a very limited way. I'm planning on adding InputStream on top of this if this refactor is in the right direction.