Locate speed

Discussion of open issues, suggestions and bugs regarding UniDAC (Universal Data Access Components) for Delphi, C++Builder, Lazarus (and FPC)
Post Reply
sandy771
Posts: 194
Joined: Tue 22 May 2007 13:57

Locate speed

Post by sandy771 » Fri 30 Jul 2010 13:07

Using a UniQuery to connect to a sqlite database with about 6 Million unique sha1 values in a column, the column is fully indexed

I am using

if(ShaHash->Locate("sha1", sha1, TLocateOptions()))
{
//do something
}

to determine whether various values are in the table.

The first time through the loop Locate takes an age to execute (minutes) on repeat calls the function returns immediately. Is this normal?

DepSoft
Posts: 20
Joined: Tue 27 Jul 2010 03:01
Location: Western Australia

Post by DepSoft » Fri 30 Jul 2010 15:44

6M records is far too many to be bringing to the client and performing that sort of search with any kind of data access mechanism. To do that it may have to bring the entire 6M records to the client before finding the one you're interested in.

Can you not simply set your query to:

Code: Select all

select * from table where SHA1 = SomeValue
and examine rows of interest individually that way?
i.e. Let the server do what its best at i.e. returning as few rows as possible to the client that match some criteria you provide.

Regards, Paul.

sandy771
Posts: 194
Joined: Tue 22 May 2007 13:57

Post by sandy771 » Sat 31 Jul 2010 11:13

The problem is that I am doing multiple (10's of thousands) look ups on the hashes (ultimately about 16Million of them).

Spped is not a major issue, i.e. expect the proces to take 10's of minutes - that is fine as it is not run that often.

The sqlite database is a local file.

sandy771
Posts: 194
Joined: Tue 22 May 2007 13:57

Post by sandy771 » Sun 01 Aug 2010 13:22

Odd that on the same data set a select * from x where sha1 = ? I can run through 100K records looking up each hash in a table of 8M records in a little over a minute

DepSoft
Posts: 20
Joined: Tue 27 Jul 2010 03:01
Location: Western Australia

Post by DepSoft » Mon 02 Aug 2010 03:33

I think that sounds about right. Selecting and returning 1 record takes a matter of milliseconds when the main table is indexed correctly.

When you say 'ultimately 16 million'; is that number of records in the main table or number of lookups?

IIRC, you said that the DB was local to the client.

Using a UniQuery where UniDirectional is False (needed to allow locate to work by moving forwards and backwards through the data):
  • The DB must first assemble all the records to fill the query (requires lots of memory for DB).

    Then the UniQuery gets populated (another chunk of memory to hold all the records again).

    Your Locate has to search all these records for your value.
When you query the DB for a specific record instead:
  • DB uses its index/keys (very optimal) and returns 1 record very quickly with not much memory used.

    UniQuery gets the one record (again hardly any memory and fast).

    No further searching/locating required (you have answered your question that the record exists) and you can modify and post changes as required.
If you are only ever going to perform a number of specific queries less than about 25% of the total number of records in the main table then I would think it will be quicker than loading all the table and locating on that. You would have to experiment with times and number of queries.

Also, you could leave the index on the main table to save time rebuilding and prevent duplicates, etc.

Do you have to modify the records that exist or do you do something else if they exist?

Regards, Paul.

Dimon
Devart Team
Posts: 2910
Joined: Mon 05 Mar 2007 16:32

Re: Locate speed

Post by Dimon » Mon 02 Aug 2010 08:49

sandy771 wrote:The first time through the loop Locate takes an age to execute (minutes) on repeat calls the function returns immediately. Is this normal?
The point is that the FetchAll property is set to False by default. In this case when you call Locate for the first time UniDAC fetches all unfetched records and just after this begins searching. To solve the problem set the FetchAll option to True, like this:

Code: Select all

  UniQuery.SpecificOptions.Values['FetchAll'] := 'True';
Also the Locate function works faster when dataset is locally sorted on the KeyFields fields. Local dataset sorting can be set with the TMemDataSet.IndexFieldNames property.

Post Reply