Page 1 of 1

Linq Group by performance

Posted: Thu 03 Nov 2011 01:59
by michelb1999
I'm using Entity framework with the latest version of DotConnect for MySql (6.50.237.0) in C# 4.0

I'm trying a simple groupby query on a very large table (100,000,000 records) with Linq

var imprs = from axi in db.aximpressions
where axi.Axd_SiteID == 2
group axi by axi.Axd_BannerID
into g
select new
{
g.Key,
Sum = g.Sum(i => i.Axi_Impressions)
};

The query takes a few minutes to execute (and 5.5 gigs of ram)

Here is the generated .ToTraceString() SQL

SELECT GroupBy1.K1 AS Axd_BannerID,
GroupBy1.A1 AS C1
FROM
(
SELECT Extent1.Axd_BannerID AS K1,
Sum(Extent1.Axi_Impressions) AS A1
FROM
(
SELECT aximpression.Axi_ID AS Axi_ID,
[...ALL table other columns...]
FROM adxpose.aximpression AS aximpression
) AS Extent1
WHERE Extent1.Axd_SiteID = 2
GROUP BY Extent1.Axd_BannerID
) AS GroupBy1

The innermost query select ALL records (yes all 100,000,000 records)
Then another query filter on the Axd_SiteID field and do the grouping and sum

Now I understand why it takes forever
But why the generated query isn't something simple like this ?

select axd_bannerid,
sum(axi_impression)
from AxImpression
where axd_siteid = 2
group by axd_bannerid

This execute in a 3-4 seconds

Thanks!
Michel

Posted: Thu 03 Nov 2011 11:09
by Shalex
It seems like you have DefiningQuery in the SSDL part of your *.edmx: http://msdn.microsoft.com/en-us/library/bb738450.aspx. Try removing it.

Posted: Thu 03 Nov 2011 12:37
by michelb1999
That fixed my problem

The DefiningQuery is generated because the table doesn't have a primary key and it can't since I'm using a column database (InfoBright)

Now I need to find a way to make sure there will never be any DefiningQuery in the edmx

Thanks!
Michel

Posted: Fri 04 Nov 2011 15:08
by Shalex
michelb1999 wrote:Now I need to find a way to make sure there will never be any DefiningQuery in the edmx
We recommend using Entity Developer (*.edml) which never creates DefiningQuery automatically.