Intermittent failure to connect with SSL client certificate - Cannot load client private key

Discussion of open issues, suggestions and bugs regarding ADO.NET provider for PostgreSQL
Post Reply
NinjaNichols
Posts: 4
Joined: Tue 28 Feb 2017 16:09

Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by NinjaNichols » Tue 28 Feb 2017 18:44

We recently switched our ASP.NET web apps from password-based authentication to SSL client certificates. Now we are seeing intermittent failures in our logs when connecting to the database.

Code: Select all

System.Data.Entity.Core.EntityException: The underlying provider failed on Open. ----> Devart.Data.PostgreSql.PgSqlException: Cannot load client private key. ----> Devart.Security.SSL.u: Cannot load client private key. ----> System.Security.Cryptography.CryptographicException: Couldn't acquire crypto service provider context.
   at Devart.Cryptography.al.a(IntPtr& A_0, String A_1)
   at Devart.Cryptography.al.b()
   at Devart.Security.g.a(Byte[] A_0)
   at Devart.Security.g.h(String A_0)
   at Devart.Common.af.a(String A_0, String A_1)
   --- End of inner ExceptionDetail stack trace ---
   at Devart.Common.af.a(String A_0, String A_1)
   at Devart.Data.PostgreSql.y..ctor(String A_0, Int32 A_1, Encoding A_2, Int32 A_3, SslOptions A_4, ProxyOptions A_5, Int32 A_6)
   --- End of inner ExceptionDetail stack trace ---
   at Devart.Data.PostgreSql.w.y()
   at Devart.Data.PostgreSql.w..ctor(PgSqlConnectionOptions A_0)...).
The issue seems to be more or less random and typically affects at most 1 or 2 out of ~10 web apps at a time. Sometimes restarting the app fixes it, and sometimes not. All web apps run under the same service account. It might be correlated with high database load, but we’re not sure.

For reference, we are using Entity Framework with a connection string that looks like this:

Code: Select all

<add name="DatabaseEntities" connectionString="metadata=res://*/DatabaseDB.csdl|res://*/DatabaseDB.ssdl|res://*/DatabaseDB.msl;provider=Devart.Data.PostgreSql;provider connection string='User Id=myuser; Host=myserver;Database=mydatabase;Schema=dbo;SslMode=require;Ssl Key=postgres.key;Ssl Cert=postgres.crt; Persist Security Info=True'" providerName="System.Data.EntityClient" />
Steps taken to troubleshoot:
  • Granted permissions on C:\ProgramData\Microsoft\Crypto\RSA to web app service account.
  • Tried deleting keys from C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys
  • Ran ProcMon to see if there were any access denied, file lock or similar errors, but found nothing.
Additional System information:

Database server is running PostgreSQL 9.4 on CentOS
Web apps are 64-bit running on IIS 8.5 on Windows Server 2012 R2
dotConnect for PostgreSQL Professional 7.3.342.0

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by Pinturiccio » Fri 03 Mar 2017 12:21

The issue may happen when connections are opened often in several threads, and when two connections are opened simultaneously, a certificate file may be locked by one of them. By default, "Max Pool Size" connection string parameter has value 100. Increase the value of "Max Pool Size" connection string parameter in this case.

NinjaNichols
Posts: 4
Joined: Tue 28 Feb 2017 16:09

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by NinjaNichols » Wed 29 Mar 2017 23:08

We increased the number of connections in the pool to the max that our database server can reasonably handle. We also turned on the “Validate Connection” flag in the connection string. The issue is still happening occasionally, although perhaps less frequently than before -- it’s a little hard to tell since it’s so intermittent.

Additionally, I’ve been trying to rule out threading issues in our own code. I did find and fix a few instances where we accessed the Entity Framework DbContext in a non-thread-safe way. Unfortunately that didn’t help the “Couldn't acquire crypto service provider context” error, either.

Today, I finally managed to capture one of these events while running Process Monitor, and verified that an ACCESS DENIED error is being thrown when trying to read a file under C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys.

Code: Select all

15:00:14.3549069	w3wp.exe	7412	CreateFile	C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys\1093ed8ee0d8c666e6cc4d1a224f9326_aae40b3a-ad44-409e-88c5-c23c5125b58a	ACCESS DENIED	Desired Access: Generic Read, Disposition: Open, Options: Sequential Access, Synchronous IO Non-Alert, Non-Directory File, Attributes: n/a, ShareMode: Read, AllocationSize: n/a
It looks like it gives up after a few tries and never attempts to read the file again, which is why nothing showed up when I originally ran Process Monitor after the issue first appeared. The only way to get it working again seems to be restarting the app, which causes it to access a file with a different GUID.

My first thought was that it’s some kind of ACL issue, but I verified that the owner of the file was our IIS service account and that it had Read permissions. The only thing that looked odd was that the file was created and last modified over a month ago. Is it possible that these GUIDs aren’t random and could be conflicting with previous instances? There are now hundreds of files in the C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys folder and the vast majority have creation dates after we switched to using SSL client certificate authentication.

Any ideas for what to try next? I know our dotConnect driver is a couple versions behind (7.3.342.0) and I could possibly upgrade if there’s been any subsequent updates that could fix this issue. I'm also wondering if there's a way to safely clean up these MachineKey files in case there's some kind of conflict with old ones.

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by Pinturiccio » Fri 31 Mar 2017 16:38

MachineKeys files are created by the operating system. If a user with less privileges, than the user who created the file has, tries to access it, he will get the access denied error.

Maybe Network Service is used in some applications for application pool identity. The following links can be useful for you:
https://mtrinder.wordpress.com/2011/11/ ... b-service/
https://www.iis.net/learn/manage/config ... identities

Try specifying a separate application pool for each application. You can also try changing identity from Network Service to some another identity.

You can also try deleting the files from MachineKeys. New files will be created automatically when accessing them the first time, and the user, who accessed them first, will determine the necessary level of privileges for access to them.

NinjaNichols
Posts: 4
Joined: Tue 28 Feb 2017 16:09

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by NinjaNichols » Tue 04 Apr 2017 14:29

Thanks for the info. I took a look at those links. We already do most of the suggestions; each app has its own app pool that runs under a custom Identity (a service account we created just for the IIS processes).

I’ve spent the day trying to isolate the issue and I think I see what’s going on. The underlying MachineKey file is being created with the wrong ACL permissions, which prevents the file from ever being modified or removed. My guess is that this is due to the combination of connecting to Postgres with SSL client certificates and using an ASP.NET application running under a custom service account identity (instead of the default ApplicationPoolIdentity).

To help debug, I created a simple ASP.NET web app that runs a basic query:

Code: Select all

string myConnStr = ConfigurationManager.ConnectionStrings["MyConnectionString"].ConnectionString;
using (var pgSqlConnection = new PgSqlConnection(myConnStr))
{
    pgSqlConnection.Open();
    var cmd = pgSqlConnection.CreateCommand();
    cmd.CommandText = "SELECT COUNT(*) FROM pg_stat_activity;";
    var ret = Convert.ToInt32(cmd.ExecuteScalar());
    return "There are currently " + ret + " active database connections.";
}
I then created a new web site and a new application pool in IIS, leaving all the defaults except for the Application Pool Identity which I changed from ApplicationPoolIdentity to our custom service account (MYDOMAIN\IISServiceAccount).

Image

When I hit the web app and it connects to the database, a new key file is created in the C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys folder. If I right-click and view the Properties, then under Security I see that the owner is my service account (as expected), but the OWNER RIGHTS is only set to “Read permissions”. However, there is another ACL entry that grants Full Control to the IIS ApplicationPoolIdentity account ("IIS AppPool\{App pool name}") -- despite the fact that I specifically configured the application pool to use a custom Identity instead.

Image

You can see this a little more clearly in the last entry of the PowerShell Get-Acl output:

Code: Select all

PS> Get-Acl "C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys\2043c0bc8afc723bafb54183f657bf07_
aae40b3a-ad44-409e-88c5-c23c5125b58a" | format-list
Path   : Microsoft.PowerShell.Core\FileSystem::C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys\2043c0bc8afc723bafb54183
         f657bf07_aae40b3a-ad44-409e-88c5-c23c5125b58a
Owner  : MYDOMAIN\IISServiceAccount
Group  : MYDOMAIN\Domain Users
Access : OWNER RIGHTS Allow  ReadPermissions
         NT AUTHORITY\SYSTEM Allow  FullControl
         BUILTIN\Administrators Allow  FullControl
         IIS APPPOOL\CrytoWebAppPool Allow  FullControl
I’ve noticed that if I change the app pool Identity back to ApplicationPoolIdentity, then the machine key file in the C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys folder is cleaned up after stopping the app pool. If I leave the Identity set to my custom service account, the file is never removed and a new one is created each time the app pool recycles. As the number of these files grows, the probability of the app accessing one of the existing key containers increases. When that happens, it gets an Access Denied, and throws the “Couldn't acquire crypto service provider context” exception.

Image

I’ve managed to get the exception to happen almost every time (instead of randomly once per week) by manually creating a bunch of machine key files with the same Key Container naming scheme, which I reverse engineered from the ASCII string within the existing files.

Image

I ran this code from within the web app so that the resulting keys would have the same owner (IONHARRIS\IISServiceAccount) as the real keys created automatically.

Code: Select all

// Causes a ton of private keys to be written to the machine key folder. Use the key container 
// naming scheme “{GUID}{DOMAIN}{ACCOUNT NAME}{5-DIGIT INTEGER}” to ensure conflicts. 
for (int i = 1; i < 30000; i++)
{
    string baseContainerName = "{48959A69-B181-4cdd-B135-7565701307C5}MYDOMAINServiceAccountName";
    var provider = new RSACryptoServiceProvider(new CspParameters
    {
      KeyContainerName = baseContainerName + i,
      Flags = CspProviderFlags.UseMachineKeyStore
    });
    provider.Dispose();
}
Afterwards every database connection attempt failed with the "Couldn't acquire crypto service provider context" exception.

So it now seems pretty clear that the keys are being created with too few permissions to be disposed of later, and eventually build up to the point that they begin to prevent future connections. Trouble is, I can’t tell why the underlying machine key is being created with an ACL that uses the name of the Application Pool instead of using the service account Identity.

It doesn’t seem like the file should be owned by one user account but give full control to another (in this case, unused) user account. Of course, changing the app pool Identity back to ApplicationPoolIdentity fixes the issue in my sample app, but not in the production system where we do want the apps to run under a managed service account.

Thanks for your help so far. Any additional guidance would be greatly appreciated.
- Steven

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by Pinturiccio » Thu 06 Apr 2017 16:43

We use the algorithm, recommended by Microsoft. If there is a storage in the C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys folder for the user, and the user can open it, we open it. If access is denied for the storage, a new file is created.

We created a user account that is used only by IIS. Then we created several application pools and specified their application pool identities equal to this new user. Each website, using its own application pool, creates a separate file in C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys . When we unload an application pool, the corresponding file is deleted.

Probably it depends on the privileges of the user, specified in the application pool identity. We don’t know how exactly IIS assigns privileges to a user specified in the application pool. You need to contact Microsoft support on this question.

NinjaNichols
Posts: 4
Joined: Tue 28 Feb 2017 16:09

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by NinjaNichols » Thu 13 Apr 2017 18:47

I have discovered how IIS assigns privileges. IIS 7 introduced a feature called App Pool Isolation (http://www.adopenstatic.com/cs/blogs/ke ... 15759.aspx) which is designed to prevent two apps from reading each other’s config files even when they both run under the same user account. Basically the IIS folks wanted to protect against a malicious app stealing credentials from other apps in the situation where all the app pools run under the same account, as say NETWORK SERVICE.

The way they did this is by creating a special group SID (“IIS APPPOOL\{name of app pool}”) that is injected into the IIS process (w3wp.exe). They then set the permissions for the process such the user gets almost no permissions while the special IIS APPPOOL\{name of app pool} SID gets Read and Write permission.

Image

The machine key that gets generated is inheriting the ACL from the w3wp.exe process:

Image

Normally, this would be perfectly fine, since we don’t want our apps to be able to read each other’s config files or private machine keys. The issue however is that sometimes one app gets assigned a machine key that has already been used by another app.

The machine keys are accessed by a call to the CryptAcquireContext function in the Microsoft CryptoAPI (https://msdn.microsoft.com/en-us/librar ... 5%29.aspx). This call takes a Key Container Name as an argument and is used to lookup the corresponding machine key file.

The Devart dotConnect library is black box to me, but as far as I can tell, this Key Container Name is generated by Devart and always follows this format where DOMAIN and Username are for the account that the code is running under:

Image

After spending a lot of time playing around with the key containers, I’ve found that I can prevent an app from connecting to the database simply by first having an another app in a different app pool call the CryptoAPI using a key container name that has the first app’s Process ID.

Of course two processes can never have the same PID at the same time, but these machine keys are persistent and eventually the PID gets re-used. When this happens, an app in a different app pool can get the same Key Container Name as a previously created machine key. In the example below, App Pool 2 tries to read and then re-create a machine key that was originally made by App Pool 1. This fails since Machine Key 1’s ACL only allows access by App Pool 1.

Image

In conclusion, the issue is the combination of IIS App Pool Isolation when multiple app pool run under the same account and having a Key Container Name that can be re-used between app pools. This seems like a pretty common scenario and we did verify that it is still present in the latest dotConnect driver (7.8.862.0).

Questions:

1. Is there any way to make the machine keys non-persistent? If the key was only in memory, this wouldn’t be an issue.

2. Can I somehow force the machine key to always be removed? I did some more testing and realized that the keys sticking around after unloading the app pool probably isn’t a permission issue like I first thought. I found that explicitly calling Dispose() on every PgSqlConnection usually, but not always, removes the machine key. I haven’t looked into it too much, but it seemed like the key was removed less frequently when there were a lot of connections (like 25) in the connection pool. I’m also worried about the case when IIS forcibly stops the app pool without letting it cleanup first.

3. Is there a way to change the Key Container Name such that it can’t get reused in the way I described above?

Thanks again for your help. Tracking down all these contributing factors has been a real the adventure. Hopefully my explanation makes sense.
- Steven

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by Pinturiccio » Tue 25 Apr 2017 14:12

We have answered you via e-mail.

wpojon
Posts: 1
Joined: Fri 14 Jun 2013 04:16

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by wpojon » Wed 03 May 2017 14:20

So what is the solution? I get this issue almost once a week on Azure servers.

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by Pinturiccio » Thu 04 May 2017 09:22

We have made some changes in dotConnect for PostgreSQL:
1. Previously the name of MachineKeys container contained a static GUID {48959A69-B181-4cdd-B135-7565701307C5}. Now a new GUID will be generated when loading the application.
It will allow avoiding conflicts when a container is created for the same account and process has id that was used before.

2. We remove the container immediately after establishing a connection to the server (in conn.Open()), and not wait for the end of the process execution. This decreases the probability of file cluttering. The container is created only during the authentication, and this is a very short period of time. Probably you won’t even notice this file.

We will post here when the corresponding build of dotConnect for PostgreSQL is available for download.

Pinturiccio
Devart Team
Posts: 2420
Joined: Wed 02 Nov 2011 09:44

Re: Intermittent failure to connect with SSL client certificate - Cannot load client private key

Post by Pinturiccio » Fri 26 May 2017 10:05

New version of dotConnect for PostgreSQL 7.9 is released!
It can be downloaded from http://www.devart.com/dotconnect/postgr ... nload.html (trial version) or from Customer Portal (for users with valid subscription only).
For more information, please refer to viewtopic.php?t=35438

Post Reply