Can performance of data comparison be improved?
Posted: Wed 21 Aug 2019 10:32
I am trying to compare two, relatively large 150GB tables.
Is there any way to avoid querying all of the data?
One idea: I've seen other tools (happy to give examples separately) that compare data in blocks/chunks, based on the primary key. Then calculate a hash/checksum on the data at the MySQL server for each chunk. Finally, only query the underlying data to compare if the checksum is different.
In such a case, I realize the "show identical" feature would be disabled since the dbForge application would not have that data. But for such a large table, it would not make sense to show that data anyway.
This approach would substantially reduce the amount of data transfer across the network, which can be a huge benefit (in terms of both bandwidth and time) if the databases being compared are in different locations.
Thank you.
Is there any way to avoid querying all of the data?
One idea: I've seen other tools (happy to give examples separately) that compare data in blocks/chunks, based on the primary key. Then calculate a hash/checksum on the data at the MySQL server for each chunk. Finally, only query the underlying data to compare if the checksum is different.
In such a case, I realize the "show identical" feature would be disabled since the dbForge application would not have that data. But for such a large table, it would not make sense to show that data anyway.
This approach would substantially reduce the amount of data transfer across the network, which can be a huge benefit (in terms of both bandwidth and time) if the databases being compared are in different locations.
Thank you.