MySQL and ElasticSearch Pagination Solutions

MySQL Pagination

Basic Principle of Pagination

SELECT * FROM test ORDER BY id DESC LIMIT 10000, 20;

LIMIT 10000, 20 means scanning a total of 10020 rows that satisfy the condition, discarding the first 10000 rows, and returning the last 20 rows. If LIMIT 1000000, 100 is used, it will scan 1000100 rows. In a high-concurrency application, every query would need to scan over 1 million rows.

ElasticSearch

From a business perspective, ElasticSearch is not a typical database; it is a search engine. If the desired data is not found through filter conditions, continuing with deep pagination will not yield the desired results. To put it another way, if we use ES as a database, we will encounter the max_result_window limitation when paginating. The official documentation even tells you that the maximum offset limit is 10,000.

Query Process:

For example, if querying page 501 with 10 records per page, the client sends the request to a certain node.
This node broadcasts the data to each shard, and each shard queries the first 5010 rows of data, returning the results to that node.
The data is then aggregated, and the first 5010 rows are extracted.
The results are returned to the client.

Solution

Original Pagination SQL:

Page 1

SELECT * FROM year_score WHERE year = 2017 ORDER BY id LIMIT 0, 20;

Page N

SELECT * FROM year_score WHERE year = 2017 ORDER BY id LIMIT (N - 1) * 20, 20;

By using context, rewrite it as:

Represents known data

SELECT * FROM year_score WHERE year = 2017 AND id > XXXX ORDER BY id LIMIT 20;

As mentioned in the SQL optimization article, LIMIT will stop querying once the condition is met, so this approach drastically reduces the scan volume and greatly improves efficiency!

ElasticSearch Solution

The solution is the same as MySQL, and now we can freely use the FROM-TO API without worrying about the maximum limit.

MySQL Deep Pagination Example:

As mentioned in the SQL optimization article, here is the handling technique for deep pagination in MySQL:

Bad Example (Time: 129.570s)

SELECT * FROM task_result LIMIT 20000000, 10;

Good Example (Time: 5.114s)

SELECT a.* FROM task_result a, (SELECT id FROM task_result LIMIT 20000000, 10) b WHERE a.id = b.id;

Explanation: The task_result table is a production table with 34 million rows, and id is the primary key. With an offset of 20 million, the core logic of this approach is based on the clustered index. It quickly retrieves the primary key ID of the specified offset data without needing to access the table again, and then uses the clustered index for the lookup, resulting in only 10 rows being retrieved, which is highly efficient.