Linq Group by performance

Discussion of open issues, suggestions and bugs regarding ADO.NET provider for MySQL
Post Reply
michelb1999
Posts: 2
Joined: Thu 03 Nov 2011 01:22

Linq Group by performance

Post by michelb1999 » Thu 03 Nov 2011 01:59

I'm using Entity framework with the latest version of DotConnect for MySql (6.50.237.0) in C# 4.0

I'm trying a simple groupby query on a very large table (100,000,000 records) with Linq

var imprs = from axi in db.aximpressions
where axi.Axd_SiteID == 2
group axi by axi.Axd_BannerID
into g
select new
{
g.Key,
Sum = g.Sum(i => i.Axi_Impressions)
};

The query takes a few minutes to execute (and 5.5 gigs of ram)

Here is the generated .ToTraceString() SQL

SELECT GroupBy1.K1 AS Axd_BannerID,
GroupBy1.A1 AS C1
FROM
(
SELECT Extent1.Axd_BannerID AS K1,
Sum(Extent1.Axi_Impressions) AS A1
FROM
(
SELECT aximpression.Axi_ID AS Axi_ID,
[...ALL table other columns...]
FROM adxpose.aximpression AS aximpression
) AS Extent1
WHERE Extent1.Axd_SiteID = 2
GROUP BY Extent1.Axd_BannerID
) AS GroupBy1

The innermost query select ALL records (yes all 100,000,000 records)
Then another query filter on the Axd_SiteID field and do the grouping and sum

Now I understand why it takes forever
But why the generated query isn't something simple like this ?

select axd_bannerid,
sum(axi_impression)
from AxImpression
where axd_siteid = 2
group by axd_bannerid

This execute in a 3-4 seconds

Thanks!
Michel

Shalex
Site Admin
Posts: 9543
Joined: Thu 14 Aug 2008 12:44

Post by Shalex » Thu 03 Nov 2011 11:09

It seems like you have DefiningQuery in the SSDL part of your *.edmx: http://msdn.microsoft.com/en-us/library/bb738450.aspx. Try removing it.

michelb1999
Posts: 2
Joined: Thu 03 Nov 2011 01:22

Post by michelb1999 » Thu 03 Nov 2011 12:37

That fixed my problem

The DefiningQuery is generated because the table doesn't have a primary key and it can't since I'm using a column database (InfoBright)

Now I need to find a way to make sure there will never be any DefiningQuery in the edmx

Thanks!
Michel

Shalex
Site Admin
Posts: 9543
Joined: Thu 14 Aug 2008 12:44

Post by Shalex » Fri 04 Nov 2011 15:08

michelb1999 wrote:Now I need to find a way to make sure there will never be any DefiningQuery in the edmx
We recommend using Entity Developer (*.edml) which never creates DefiningQuery automatically.

Post Reply