Optimizing Optimizers
Current commercial optimizers do not work very well.
This is well known.
There are multiple problems with the current technology:
- Cardinality estimates used in current optimizers are flawed. Major problems include the “independence assumption,” whereby it is assumed that columns in a table or tables are uncorrelated, so simple formulas can be used for the size of the result of multiple filters and/or joins. These size estimates can be very inaccurate. Very skewed data values also generate the same sort of errors.
- The cost model used by current optimizers is flawed. It assumes a linear combination of resources and does not take cache effects into account. Hence, the cost predicted by the optimizer may not be closely correlated with actual run time.
- Cardinality estimates used in current optimizers are flawed. Major problems include the “independence assumption,” whereby it is assumed that columns in a table or tables are uncorrelated, so simple formulas can be used for the size of the result of multiple filters and/or joins. These size estimates can be very inaccurate. Very skewed data values also generate the same sort of errors.
- The cost model used by current optimizers is flawed. It assumes a linear combination of resources and does not take cache effects into account. Hence, the cost predicted by the optimizer may not be closely correlated with actual run time.
- The search space is invariably truncated, because it is too large to search exhaustively. Hence, the optimizer may miss a promising plan because of “early termination.”
Our intuitive experience is that the third possible problem affects the running time of the optimizer but does not radically affect plan quality. Current optimizers usually run until the rate of plan improvement does not justify continuing to look. In experimental settings, continuing to look very rarely results in better plans.
The first problem, cardinality estimates, is a known serious problem, so we have focused there initially. Specifically, we have engineered a version of the Postgres optimizer to allow cardinality estimates to be fed from outside. Moreover, on the Join Order Benchmark, we laboriously computed the actual size of all possible intermediate results. Then, we ran the optimizer under the following circumstances:
Basecase: Normal Postgres
Perfect cardinality (n): the optimizer is fed with perfect cardinality estimates for all possible intermediate joins results containing n joins
Our results showed that significant improvement in plan quality did not occur until Perfect (4). In other words, getting the cardinalities of all 2-way joins correct made very little difference. Hence, work on improving cardinality statistics is unlikely to help optimizer quality. In contrast, re-optimizing the plan when it “goes off the rails” works quite well. A paper on this work is in preparation.
In the future, we will look for other tactics that can help plan quality. Where appropriate, we will try to use machine learning techniques to advantage.
Citations