RFC: Rewrite PipeStream and SshCommand stream handling #144

hifi · 2016-12-28T16:51:51Z

More efficient when moving large amounts of binary data
- Speedup on Mono: ~1.71x
- Speedup on Win10: ~1.39x
No in-memory buffering of data
- Except if you use the Execute() method to return a result
  string
- The result string will be empty if you have read the output
  stream completely

Unless you close or have something reading the error stream it
can completely block the execution as it will hang the event
handler indefinitely if no one is reading the other end
Reading from PipeStream will now always block until EOF
Writing to PipeStream will now always block until a reader has
completely emptied it

This has not been tested with anything else than SshCommand in a very limited way. I'm planning on adding InputStream on top of this if this refactor is in the right direction.

hifi · 2016-12-29T09:49:19Z

InputStream is now implemented. It is now possible to fully control the process execution and implement things like git+ssh or rsync+ssh on top of SSH.NET.

drieseng · 2016-12-29T10:38:23Z

Great contributions.
I'll be reviewing them in details asap.
Can you start by adding more unit tests for PipeStream?
I was planning on using a linked list implementation for PipeStream, but that should not affect the behavior (and as such, it should not affect the tests you add).
I'll try to find time for that and based on performance tests, we'll either use that implementation or yours.
OK?

hifi · 2016-12-29T11:00:20Z

I didn't see any reason to buffer data in the pipe as that will only increase memory use without improving throughput so my PipeStream implementation keeps only one window of data and expects someone to read it and writing will block until it has been completely read out.

The absolute best way to handle the incoming data from SSH would be to (optionally) throw out the streams and use the events instead as that would completely eliminate read blocking and unnecessary threading/locking that's caused by it.

I'm not 100% sure what the expected behavior of PipeStream is as the non-blocking exception thing was there that I removed and there's a bit of confusing going on with my commits regarding disposing the streams and letting the last buffer to be read out. Flushing was completely removed for instance as I didn't see the need for it in the simple version of PipeStream but it might not be the right move.

Regarding performance I have some doubts a linked list would be any faster as it would increase the amount of work to be done if there's a reader ready to empty it at all times (like Stream.CopyTo()) and the list would mostly contain only one element. It would also require you to support partial reading of a written byte buffer which will increase the complexity.

A linked list would also change the behavior from (my) blocking back to non-blocking and it would allow the writer to exhaust the memory if the reader is slow for any reason where usually the speed of transfer is determined by the complete blocking chain.

The original PipeStream implementation resulted in around 50-55% CPU use where my hanged around 20-30%. Throughput during my test was a constant 1Gbit/s with both and OpenSSH hitting local network limit.

drieseng · 2016-12-29T11:17:13Z

I think we can agree that the original implementation left "some" room for improvement.
However, I don't think we should block writing until the buffer has been read.
Your implementation may be the fastest and cover your specific use case, but let's see if we can support more use cases (without sacrificing all too much performance).

hifi · 2016-12-29T11:32:52Z

Streams are supposed to be blocking (right?) so the default being blocking would be something I'd expect. The Result property and EndExecute method both empty out the stream properly with my PipeStream implementation after shuffling SshCommand a bit so it should work even with the blocking behavior.

Making it all not block would be best done by having events as it removes the pipes completely and if no one is listening then the data is simply just lost. Blocking the writer when not using events looks like the right call to me.

The big issue is handling reading both stdout and stderr simultaneously. They can both become large if buffered in-memory and no one is reading them. The simple solution for the user of the api would be to dispose the extended stream immediately to prevent it from blocking if the process writes to stderr and the user does not care about it.

In summary, changing the api of SshCommand and/or adding another command class for blocking use might be worth a shot to support different use cases.

I'm a little bit annoyed when people use the TTY method to run simple commands (with Expect kludge) when SshCommand is usually the right thing to do when you're not running anything interactive so getting it user friendly and performant at the same time is something I have a lot of interest in.

hifi · 2016-12-29T14:19:24Z

I thought about the blocking problem more and I think a simple works-for-everyone solution would be to make SshCommand export the streams as PipeStream instead of plain Stream and have a property for switching between blocking and non-blocking (buffering) mode.

Depending on your performance testing, if a linked list is not significantly slower than the raw byte[] thing I have when there's a reader always available, switching between blocking and non-blocking mode would just add a lock to writer when there's something in the list or alternatively it would switch between a list and a single byte[] buffer if there's a major performance difference.

I can see this benefiting my use case as well as then I could keep OutputStream blocking but let ExtendedOutputStream buffer everything in-memory when I'm not expecting there to be much. It would also make a nice default if all streams buffer as you want them but the property be exposed to switch them on-demand.

What do you think?

drieseng · 2016-12-29T14:45:41Z

I'm still incliding to go for the buffered (non-blocking write, blocking read) approach.
However, give me some time to look into it.
Meanwhile, do you have (more or less selfcontained) integration tests that you use for testing both input and output streams?

hifi · 2016-12-29T14:49:08Z

You can execute 'cat' on the remote system and then loopback everything through it from InputStream to OutputStream. I used some random binary data (files) and hashed them after piping through and measured the CPU time while it was doing that.

hifi · 2016-12-30T06:04:29Z

https://gist.github.com/hifi/9213b8981a938be7fd2dda4790a40dfb an alternative PipeStream that queues everything. It seems to be as heavy as the original implementation (on Mono) but the code is a bit cleaner. Reader is blocking but writer isn't.

hifi · 2016-12-30T12:49:08Z

More testing reveals that the buffering version I gisted does not have almost any performance impact compared to the blocking one I have in this pull request making it the clear winner supporting both workflows.

On Mono, however, the buffering implementation is CPU intensive but that's something that needs to be tackled on that side.

hifi · 2016-12-31T16:57:07Z

Here are some actual benchmarks of the two implementations on three different runtimes: https://github.com/hifi/PipeBench

drieseng · 2016-12-31T18:08:54Z

Let me start by saying: thanks for all the effort you've put into this (getting nothing but semi-negative feedback from me). I don't have much time to spend on this right now (got two branches called Sofie and Lisa that require attention).

Usage of BlockingCollection is a no-go since it's only available as of .NET 4+.
I think performance can be improved (and allocations reduced) by using a linked list approach.
I don't like the fact that the buffer size is limited to the 64 Kb.

Feel free to disregard these "negative" remarks until I have time to work with you on this.

hifi · 2017-01-01T06:05:30Z

Technically BlockingCollection is exactly what you needed. It allows the writer to write without blocking (unless a reader blocks it). I'll see if I can find an alternative solution.

There is no buffer size limitation in my BlockingCollection approach as it copies the write buffers as they are to the collection. I can replace BlockingCollection with a home grown solution to get better backwards compatiblity. It might also help with the performance on Mono.

Personal branches always before GitHub. Take care!

hifi · 2017-01-01T07:28:55Z

Added an implementation of BufferingPipeStream to the benchmark that does its own locking around Queue. It is the fastest implementation of them all on Microsoft runtimes.

drieseng · 2017-01-02T18:09:28Z

Apart from both memory and performance optimizations, one of the most important changes in the new implementation is that Read(byte[] buffer, int offset, int count) will block until either the request number of bytes are available or the stream is closed.

Before we can merge this new implementation, we should:

Add unit tests that fully cover the new implementation it.
Review our codebase where we use PipeStream and (may) count on it to only block until at least some data is available.
Decide whether to use PipeStream.Dispose() to mark the stream as EOF, or introduce a separate (internal) method for this purpose.
Modify our codebase to use the appropriate mechanism to signal that no more data is available.
Fix unit tests that are broken due to the new behavior.
Ensure we document the public-facing part of this breaking change for the next release, and provide a clear indication where users may be confronted with this change and how they can or should deal with it.

drieseng · 2017-01-02T21:51:34Z

@hifi What do you think about this: should we use PipeStream.Dispose() to mark the stream as EOF, or introduce a separate (internal) method for this purpose?

hifi · 2017-01-03T04:29:02Z

@drieseng The Read() method is allowed to throw ObjectDisposedException by definition so it might be better to set EOF with an internal method.

hifi · 2017-01-04T04:13:09Z

@drieseng Would an empty write do it? A write with zero count would queue normally but when the reader hits that empty buffer it would also return and not dequeue that from the linked list anymore?

hifi · 2017-01-04T15:20:17Z

Actually I had a discussion about Streams on IRC and my eyes were opened that on C# a Stream instance should only do one direction at a time. A pipe should consist of two streams where closing the writer stream also indicates EOF. We should do that instead.

drieseng · 2017-01-04T17:36:06Z

Yeah, we could go for a PipeInputStream/PipeOutputStream approach.
I don't like the fact that in .NET a stream exposes both read and write members.

hifi · 2017-01-04T18:12:11Z

https://gist.github.com/hifi/3882f07f9064ce87e1a43d27a83888b3

I haven't even run this once and it might be overkill and I might have refactored too much but in this you create a Pipe which has two public properties InputStream and OutputStream. You can then decide who to give which and they should only allow you to do the thing you're supposed to be doing and closing InputStream would signal EOF.

Tried to share as much code as possible between the streams.

drieseng · 2017-01-04T18:25:33Z

I think I'd prefer to split PipeStream into PipeInputStream and PipeOutputStream, and maybe have PipeInputStream "read" from PipeOutputStream (not using the Stream.Read(byte[] buffer, int offset, int count) method on PipeOutputStream of course)

drieseng · 2017-01-04T20:15:20Z

Want me to propose something? I think I'll have time tomorrow or Friday.

hifi · 2017-01-05T08:48:19Z

You got the streams mixed but sure, propose something so we have things to compare. One design should stick out to be the cleanest and easiest to understand.

I thought about having the pipe split into two streams but the "ownership" of the data was something that seemed odd to me if one of them had some sort of authority. That's why I used a nested class to hold the state/data and shared that between the two pipes. They could be split into two different classes instead of having a single class that does one of them depending on how it's initialized.

I'll keep experimenting with the design as well.

drieseng · 2017-01-05T21:34:41Z

I don't think I mixed the streams, but feel free to correct me.
Those are the named using in Java.
An InputStream in Java is actually a read-only stream.

I won't have time to work on this today :(

hifi · 2017-01-06T05:11:13Z

So it was me who got them mixed up. I'm looking into this today to get those separated and properly named.

hifi · 2017-01-11T13:49:41Z

Since SshCommand is essentially remote System.Diagnostics.Process, why wouldn't we mimic the API as much as we can as it's something people are already used to and it provides all the required things?

Edit: What I mean is that we could instead deprecate SshCommand itself and create SshProcess that's API compatible with Process. This wouldn't break existing use for the next release but provide something you should transition to.

21region · 2017-01-11T14:06:40Z

Will it be possible to emulate the following command with this new functionality?

cat file.tar.gz | ssh user@server 'tar -C /path/to/dest/dir -xjf -'

This capability is very desired in our project. For now we are planning to implement it ourselves, but it will be great to see that in an official build of SSH.NET.

hifi · 2017-01-11T14:09:01Z

That is one possible use case for this PR with stdin support.

drieseng · 2017-01-12T21:38:54Z

@hifi The little time I had this week, I've been working on issue #100. I'm making good progress there, so I should soon be able to pick up work on this issue.

drieseng · 2017-01-17T21:05:35Z

@hifi Can we move back to using your PipeBench in order to arrive at a stable API that we both feel comfortable with?

I could also use your help on the performance fix I'm preparing for issue #100.
Note that I still need to commit or submit a PR for this.

hifi · 2017-01-18T04:32:35Z

Sure, PipeBench needs a little bit of cleaning to make it compatible with the double stream approach which should be the one we aim for.

drieseng · 2017-01-25T19:52:08Z

@hifi Could you have a look at performance changes I committed for SFTP download? More specifically, please have a look at SftpFileReader and tell me what you would change or how you would've implemented it.

There's still quite some stuff left to do in that class. Mostly around exceptional handling, but more importantly around cancellation (e.g. how to cancel a blocking wait for a semaphore without having a CancelleationToken?). I've been thinking about having our proper implementation of CancellationToken that we can use when we're targeting frameworks that don't provide this, but I'm sure there are simpler solutions ...

alanmcgovern · 2017-02-13T10:20:34Z

Out of curiosity, what needs to be done to get this PR completed? I love the idea of exposing a RemoteProcess type to mimic System.Diagnostics.Process. That approach would clean up some of my code nicely!

drieseng · 2017-02-13T18:17:32Z

I think we kind of abandoned the process idea, and were discussing the design of the PipeStream.
I proposed to first finish it in @hifi's PipeBench and then integrate it into SSH.NET. Since then I mostly worked on other issues.

I first want to release 2016.1.0-beta2 (about time, it seems) before I continue working on this issue, but I'd gladly welcome you to join the discussion (and - if possible - help with the implementation).

hifi · 2017-02-14T04:06:02Z

Did we abandon the process idea? I just proposed it a few messages up. It did intrigue me to make it look like a remote System.Diagnostics.Process if it makes sense to people.

I haven't been active around this project for a few weeks now but I'll get back to it.

drieseng · 2017-02-14T08:38:19Z

"Abondon" was perhaps a bit too strong worded. I though we discussed it, and decided to at least postpone it (and instead first focus on the PipeStream design and implementation). But I can't find any trace of it, so I'm probably mistaken. My bad.

Personally, I'd still prefer to move forward like this:

Design and implement the PipeStream (including unit tests)
Analyze the impact of replacing the current PipeStream implementation.
If this would cause all too many breaking changes, then we MUST introduce a new remote process concept.
If not, we integrate the new implementation into SSH.NET and document any change in behavior.

I'd happily discuss and design the remote process concept (even if the new PipeStream implementation does not introduce too many breaking change), but I know my time is limited (even though I spend already way too much time on SSH.NET).

If you guys have enough time available, then please create a new issue for this., I'll participate in the discussions on a best-effort basis.

mloefler · 2017-04-10T11:28:02Z

All the above seems to be quite demanding (timewise).
So I have some quick suggestions:

Make OutputStream protected in SshCommand since at the end of the execution it is always empty
Add an OnData event to SshCommand so I can implement my own streaming, optionally avoiding the filling of the OutpuStream, so the memory consumption is not an issue any more.

(Sorry, new here, but just wasted quite some time figuring out why OutputStream does never return anything :-) )

hifi · 2017-04-10T15:31:56Z

I kind of dropped the ball regarding this PR and went into other things. I'd still like to get it going but need to find the motivation again. Regardless the performance is excellent and could even surpass SFTP reading the performance issues it had.

drieseng · 2017-04-10T18:42:49Z

I'd also like this and other PRs to get going again, but I have too little time and would like more interaction/discussion with other devs (eg. on issue #194).

mloefler · 2017-04-11T05:48:56Z

I would not be able to help on those issue, but I will try to do two things (since I need them anyway):

Add an OnData event to SshCommand, which triggers from inside Channel_DataReceived. This enables me to implement my own data handling and/ or streaming.
Add a class to 'stream' lines based on input from OnData.
Modify getter to OutputStream to throw if stream has already been consumed.

I can make the changes available.
(Background: My ssh command extracts large sets of data that I need to process. Large meaning possibly causing a memory issue if all of it resides in memory).

thenadz · 2017-05-13T05:48:38Z

All, any word on this PR? A working OutputStream for the SshCommand would be really nice. Right now I've hacked together some code wrapping the OutputStream to force necessary blocking for a temporary workaround on my end, but if this PR is too large of a scope to get out in the near future, would you consider putting something out just to address this smaller chunk of needed (and easily addressed) functionality.

WalkerNA · 2018-01-08T22:07:43Z

Any progress on this PR? Our project has a desire that mimic's 21region's example code:

cat file.tar.gz | ssh user@server 'tar -C /path/to/dest/dir -xjf -'

hifi · 2018-01-09T03:04:06Z

I think the easiest way is to create something that has a similar API to System.Diagnostics.Process and give it a spin like that instead of trying to fix the old SshCommand. The old class could then be slowly deprecated and there wouldn't be need to keep compatibility.

Unit testing against said SshProcess(?) class could be done by comparing local execution with Process.

I haven't made any progress towards this yet so it's open to be picked up as long as there's a pre-accepted solution so no one would be doing more non-mergeable work than I already did.

masaeedu · 2018-01-22T08:41:22Z

I don't know if this is going to work, but I'm cornered enough that I'm now going to attempt the following:

Create 3 temp files on the remote system
Use UploadFile and DownloadFile with SFTP to obtain input/output streams that actually work in a sensible way
Use SSHCommand to start whatever I actually wanted to run under a shell instance with I/O file descriptors redirected to these files. Totally ignore any streams SSHCommand provides

Wish me luck 😄

ghjkl799 · 2020-03-09T09:34:33Z

Any progress on this PR? Our project has a desire that mimic's 21region's example code:

cat file.tar.gz | ssh user@server 'tar -C /path/to/dest/dir -xjf -'

Hi

Looking for the same.
Any update on progress?

elgonzo · 2020-04-03T10:58:50Z

Any progress on this would be most appreciated.

Just recently, i was trying to utilize blocking reads with SSH.NET's output PipeStream, but it seems the logic associated with the PipeStream.BlockLastReadBuffer property is either unintentional/broken or not really useful. If i want to have a blocking read of X bytes, why would the PipeStream.ReadAvailable(int count) method insist on blocking until X+1 bytes have been received?

I am not sure if it would make sense to file an issue reportwith regard to my issue (under the condition that i cannot find an equivalent issue report, of course), considering that the issue would probably disappear (or in the worst case morph into a different issue) if the changes discussed here would be implemented in SSH.NET...

Livius90 · 2022-09-04T10:49:59Z

When it will be available? I would like to send disk image datas from C# to a stdin pipe of dd over SSH, like in this example.

I would like to do the following same thing which works in Linux.

# dd bs=16M if=/dev/sda | ssh root@serverB "dd bs=16M of=/dev/sda"

WojciechNagorski · 2022-09-15T12:11:09Z

There is a fork that has this feature
https://www.nuget.org/packages/TrapTech.SSH.NET
https://github.com/TrapTech/SSH.NET/
But I wish I could use official package.

@Livius90

Rob-Hague · 2024-05-23T19:50:15Z

SshCommand.CreateInputStream was added in version 2024.0.0 in #1293 (https://sshnet.github.io/SSH.NET/examples.html#stream-data-to-a-command) and PipeStream was rewritten in #1399. I'll put this one to rest.

hifi mentioned this pull request Dec 28, 2016

Dispose streams when exec channel is closed instead of flushing #143

Closed

This was referenced Dec 29, 2016

Allow writing to stdin of SshCommand #141

Closed

SshCommand OutputStream has no way to directly set BlockLastReadBuffer #142

Closed

New pipe implementation for SshCommand

94300e0

PipeInputStream will read as much as it can without blocking

ba73c34

drieseng mentioned this pull request Feb 8, 2017

Cannot get streaming stdout/stderr and wait for remote process to exit #164

Closed

drieseng mentioned this pull request Sep 11, 2017

Hanging Issue #294

Open

WojciechNagorski mentioned this pull request Jul 27, 2022

Changes to support external transport implementations dotnet/Docker.DotNet#575

Open

mdarocha mentioned this pull request Nov 24, 2022

Add functionality to communicate with remote commands over stdin #1052

Closed

Rob-Hague closed this May 23, 2024

RFC: Rewrite PipeStream and SshCommand stream handling #144

RFC: Rewrite PipeStream and SshCommand stream handling #144

Conversation

hifi commented Dec 28, 2016

hifi commented Dec 29, 2016

drieseng commented Dec 29, 2016

hifi commented Dec 29, 2016

drieseng commented Dec 29, 2016

hifi commented Dec 29, 2016

hifi commented Dec 29, 2016

drieseng commented Dec 29, 2016

hifi commented Dec 29, 2016

hifi commented Dec 30, 2016

hifi commented Dec 30, 2016

hifi commented Dec 31, 2016

drieseng commented Dec 31, 2016

hifi commented Jan 1, 2017

hifi commented Jan 1, 2017

drieseng commented Jan 2, 2017

drieseng commented Jan 2, 2017

hifi commented Jan 3, 2017

hifi commented Jan 4, 2017

hifi commented Jan 4, 2017

drieseng commented Jan 4, 2017

hifi commented Jan 4, 2017 • edited Loading

drieseng commented Jan 4, 2017

drieseng commented Jan 4, 2017

hifi commented Jan 5, 2017

drieseng commented Jan 5, 2017 • edited Loading

hifi commented Jan 6, 2017

hifi commented Jan 11, 2017 • edited Loading

21region commented Jan 11, 2017

hifi commented Jan 11, 2017

drieseng commented Jan 12, 2017

drieseng commented Jan 17, 2017

hifi commented Jan 18, 2017

drieseng commented Jan 25, 2017 • edited Loading

alanmcgovern commented Feb 13, 2017

drieseng commented Feb 13, 2017

hifi commented Feb 14, 2017

drieseng commented Feb 14, 2017

mloefler commented Apr 10, 2017

hifi commented Apr 10, 2017

drieseng commented Apr 10, 2017

mloefler commented Apr 11, 2017

thenadz commented May 13, 2017

WalkerNA commented Jan 8, 2018

hifi commented Jan 9, 2018 • edited Loading

masaeedu commented Jan 22, 2018 • edited Loading

ghjkl799 commented Mar 9, 2020

elgonzo commented Apr 3, 2020 • edited Loading

Livius90 commented Sep 4, 2022 • edited Loading

WojciechNagorski commented Sep 15, 2022 • edited Loading

Rob-Hague commented May 23, 2024

hifi commented Jan 4, 2017 •

edited

Loading

drieseng commented Jan 5, 2017 •

edited

Loading

hifi commented Jan 11, 2017 •

edited

Loading

drieseng commented Jan 25, 2017 •

edited

Loading

hifi commented Jan 9, 2018 •

edited

Loading

masaeedu commented Jan 22, 2018 •

edited

Loading

elgonzo commented Apr 3, 2020 •

edited

Loading

Livius90 commented Sep 4, 2022 •

edited

Loading

WojciechNagorski commented Sep 15, 2022 •

edited

Loading