Page 1 of 2

SecureBridge SSH/SFTP Server Stability issues

Posted: Tue 04 Oct 2016 14:48
by tcaduto12068
Hi,
I have a service app that has the SBbridge SFTP server and a indy FTP server in the same service.
We have had several occasions where the SBridge portion will suddenly start consuming 100% CPU, just out of the blue after a week or 2 weeks of operation. While the SFTP server part is consuming 100% CPU the indy FTP portion continues to work fine, just slower because all CPU cores are being maxed out by the SFTP server.

We have also had it just randomly bink out of existence with a Kernel exception and a memory dump.

This is with Lazarus 1.6 and FPC 3.0.

Have you guys done any type of heavy load testing with the SFTP server? I don't know if this is related to the SFTP server portion consuming a lot of system handles.

Any ideas or when a updated release is expected?

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Wed 05 Oct 2016 14:21
by ViktorV
Please clarify, does this situation occur when many users connect to SFTP server or when a single user uploads/downloads large amount of data to/from SFTP server?

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Wed 05 Oct 2016 15:48
by tcaduto12068
It happens when many connections occur in a a very fast manner, even if they are from the same user.
For example in FileZilla you can specify up to 10 concurrent connections to transfer files and I think it might be related to the FDataAvailable createevent failing for the data connection in the TAsyncThread.Create;

Just recently we had a kernel error raised with a memory dump where the SFTP server service just blinked out of existence, just the other day the service maxed out at 100 percent CPU usage and pretty much shut down the entire server.

We also had a instance where a automated process that keeps a long running connection kept getting a "Can't Create event error" over and over and over as it kept trying to upload files.

This is on Lazarus 1.6 and FPC 3.0. The kernel error may have been related to me using the cmem (c memory manager) instead of the FPC one.
I have removed the cmem and recompiled the service and put it into production. Like I said some of the stability issues may have been from that, However the 100% CPU use is almost certainly related to to the create event failing.
I also modified the create event to do this:

Code: Select all

          
     lasterrorcode:=0;
     FDataAvailable := TEvent.Create(nil, True, False, '');
     {$IFDEF MSWINDOWS}
              lasterrorcode:=GetLastError;
              if lasterrorcode <> 0 then
                  begin
                       sleep(1000);
                       FDataAvailable := TEvent.Create(nil, True, False, '');
                       lasterrorcode:=GetLastError;
                       if lasterrorcode <> 0 then
                        raise Exception.Create(format('%s: Error Code %d, Error MSG:%s',[SCannotCreateEvent,lasterrorcode,SysErrorMessage(lasterrorcode)]));
                  end;
     {$ENDIF}  

 
If getlasterror comes back > 0 it sleeps for a second then tries to create the event again for FdataAvailable and if it fails a second time then it raises the exception as it originally did. It seems much more stable after I did this. I guess I will let you guys determine if this is a viable solution or not.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Wed 05 Oct 2016 19:24
by tcaduto12068
Another suggestion might be to add sleepex in a compiler def for windows in the while loops in ScVIO

i.e.

{$IFDEF MSWINDOWS}
SleepEx(0,true);
{$ENDIF}


in procedure TAsyncReceiveThread.Execute;

and

procedure TAsyncSendThread.Execute;

it seems if the conditions where right these both could cause 100% cpu usage.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Thu 06 Oct 2016 13:47
by ViktorV
We are working on improvement of this functionality, and we will try to change it in one of the next SecureBridge releases.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Thu 06 Oct 2016 14:24
by tcaduto12068
Ok, thanks.

I did have the maxstartups set to 50 instead of the default 20, so I put that back to default and that seems to have helped a bit, using the stock FPC memory manager instead of cmem also seemed to help except it now uses 3 times as much memory vs cmem.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Mon 10 Oct 2016 08:22
by ViktorV
Thank you for the information and your help on issue investigation. The issue investigation is in progress. We will inform you when we have any results.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Mon 10 Oct 2016 11:54
by tcaduto12068
Thanks, I don't know if this will help but compiling as 64bit instead of 32bit makes a huge difference, as 32bit on Windows 10, the server will external exception in a few minutes and it usually after a lot of concurrent uploads and after the 10 logins disconnect.
Just compiling as 64bit it still has issues but I can usually upload 12,000 files before issues appear.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Mon 10 Oct 2016 16:05
by tcaduto12068
oh, and I found that if I set the Aysnc Thread stack size in the inherited create event in ScVIO to:
inherited Create(False,4*1024*1024); /// Input\Output error(5) for non-Windows platforms

The stability also seems to improve.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Wed 12 Oct 2016 12:58
by ViktorV
We will investigate the behavior of SecureBridge according to your description and inform you about the results.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Wed 12 Oct 2016 13:52
by tcaduto12068
thanks,
It also seems to happen after clients disconnect. and sometimes, but not always it ends up with that
can't create event issue I mentioned before.
When this happens the entire process becomes useless, clients can connect the checkpassword event fires but it then disconnects and the afterclientdisconnect causes an access violation.

I also had the resource monitor going during one of these episodes and nothing was unusual, memory usage was good and about 280 handles where it use, which was far less than other apps.

I also did some digging on the TAsyncThread.Create issue that ultimately pops up and getlasterror is returning error 161 which is "Specified path is invalid", which kind of makes no sense but in the Free Pascal docs it mentions error 161 as "device read failure".
Also if I ignore the getlasterror in TAsyncThread.Create it will usually go into 100% CPU utilization.

It's really frustrating as sometimes I can transfer 1000s of files before any errors or exceptions popup and they are all external to any code I added to events as they always pop up in the assembler window.

Just for kicks I may try and compile in FPC 2.6.4 instead of 3.0 and see if that makes any difference.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Wed 12 Oct 2016 15:49
by tcaduto12068
I was able to capture some info:

Code: Select all

000000010014017A 488b45e0                 mov    -0x20(%rbp),%rax
000000010014017E 488b4060                 mov    0x60(%rax),%rax
0000000100140182 4883780800               cmpq   $0x0,0x8(%rax)
0000000100140187 7512                     jne    0x10014019b <CREATE+395>
..\..\..\Documents\laz_components\sbridge\Source\ScVio.pas:374  msg:=': Event Handle = Nil' else msg:= 'Handle pointer:'+intTostr(integer(FDataAvailable.Handle));
0000000100140189 488d15c0051300           lea    0x1305c0(%rip),%rdx        # 0x100270750 <_$SCVIO$_Ld1>
0000000100140190 488d4dd0                 lea    -0x30(%rbp),%rcx
0000000100140194 e8f784ecff               callq  0x100008690 <fpc_ansistr_assign>
0000000100140199 eb2e                     jmp    0x1001401c9 <CREATE+441>
000000010014019B 488b45e0                 mov    -0x20(%rbp),%rax
000000010014019F 488b4060                 mov    0x60(%rax),%rax
00000001001401A3 8b5008                   mov    0x8(%rax),%edx
00000001001401A6 488d4db8                 lea    -0x48(%rbp),%rcx
00000001001401AA e861edeeff               callq  0x10002ef10 <SYSUTILS_$$_INTTOSTR$LONGINT$$ANSISTRING>
00000001001401AF 4c8b45b8                 mov    -0x48(%rbp),%r8
00000001001401B3 488d15c6051300           lea    0x1305c6(%rip),%rdx        # 0x100270780 <_$SCVIO$_Ld2>
00000001001401BA 488d4dd0                 lea    -0x30(%rbp),%rcx
00000001001401BE 41b900000000             mov    $0x0,%r9d
00000001001401C4 e8e785ecff               callq  0x1000087b0 <fpc_ansistr_concat>
..\..\..\Documents\laz_components\sbridge\Source\ScVio.pas:375  raise Exception.Create(format('%s: Error Code %d, Error MSG:%s',[SCannotCreateEvent,lasterrorcode,SysErrorMessage(lasterrorcode)+msg]));
00000001001401C9 488b05b8380800           mov    0x838b8(%rip),%rax        # 0x1001c3a88 <RESSTR_$SCVIO_$$_SCANNOTCREATEEVENT+8>
00000001001401D0 48894590                 mov    %rax,-0x70(%rbp)
00000001001401D4 48c745880b000000         movq   $0xb,-0x78(%rbp)
00000001001401DC 486345d8                 movslq -0x28(%rbp),%rax
00000001001401E0 488945a0                 mov    %rax,-0x60(%rbp)
00000001001401E4 48c7459800000000         movq   $0x0,-0x68(%rbp)
00000001001401EC 488d4d80                 lea    -0x80(%rbp),%rcx
It's showing event.create had a handle. This happened after 10 clients disconnected and it sat idle for awile.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Fri 14 Oct 2016 03:49
by tcaduto12068
I think I solved the issue with the stability and it all boils down to the tevent.create that is used all over in the sbridge source code.

You guys are using tevent.create all over and leaving the name param blank i.e. ''
There appears to be an issue in windows if you leave that name param blank windows will eventually fail to uniquely name them and hence the invalid path error 161.

So what I did was to do my own unique naming for the tevents that where causing issues i.e.
TAsyncThread.Create

guid:TGUID;
begin
writeln('created');
CreateGUID(guid);
FDataAvailable := TEvent.Create(nil, True, False,GUIDToString(guid));


This stopped the 161 error in this particular tevent.create.

I was able to disconnect and reconnect many many more times, then a new one started popping up in:
constructor TReceiveBuffer.Create(ChunkSize: Integer);

So I did the same thing with the GUID and that one went away as well.

I did a search on tevent.create and it's used a LOT, I am thinking windows must have some kind of limit on the number of unamed events or there is a bug on how it's naming them internally.

I am going to do more testing but I think this may the solution to the stability issues.

Comments?

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Fri 14 Oct 2016 04:12
by tcaduto12068
After I added the guid two the two events another one popped up in:
constructor TSsh2Channel.Create(con: TSshConnection; chType: TScChannelType;

Adding the guid to the first two allowed me to do more conncurrent and fast connections than ever before, which I am sure then caused more use of TSsh2Channel.Create which it turn failed because of whatever issue is causing the '' name thing.

Re: SecureBridge SSH/SFTP Server Stability issues

Posted: Fri 14 Oct 2016 04:53
by tcaduto12068
Just a fyi,
here is all the locations where I added the guid to tevent.create:
This really does seem to be the solution, I was able to transfer 10s of thousands of files
all the while connecting and disconnecting 10 concurrent instances in Filezilla.

While I was really hammering it I got a tcriticalsection error, so there are still some issues but this is a huge improvement. No 100% cpu usage and no can't create event errors that required the server to be restarted. Tomorrow I will try compiling in 32bit and see how that works. It makes sense now that 64bit caused less issues because whatever mechanism was creating the unnamed events had more room to create the unique names under the hood.

C:\Users\tcaduto\Documents\laz_components\sbridge\Source\ScClient.pas
ScClient.pas (128,8) guid:tguid;
C:\Users\tcaduto\Documents\laz_components\sbridge\Source\ScReceiveBuffer.pas
ScReceiveBuffer.pas (69,9) guid:TGUID;
C:\Users\tcaduto\Documents\laz_components\sbridge\Source\ScSSH2Channel.pas
ScSSH2Channel.pas (78,8) guid:TGUID;
C:\Users\tcaduto\Documents\laz_components\sbridge\Source\ScVio.pas
ScVio.pas (362,8) guid:TGUID;