Guide
Estimator Benchmark Help
This page explains every section and field in the benchmark dashboard in plain language.
Top Toolbar
These controls decide what data is loaded and what is compared.
- Algorithm: Selects the main estimation method shown in all charts and tables.
- Scope: Chooses which map family to measure.
- Compare With: Selects another algorithm for side by side comparison. Off means no comparison.
- Compare Scope: Scope used by the comparison algorithm. It follows the same meaning as the main scope.
- Reload Data: Refreshes data from the data folder.
- Upload Data Folder: Import local comma separated value files from your machine.
- Download Current Data: Exports everything currently shown as one JavaScript Object Notation file.
- Status Badge: Shows loading status such as loading, ready, or error.
- Dataset Info: Shows active algorithm, active scope, map count, error count, and generation time.
Summary Cards
These cards show overall quality of the selected algorithm and scope.
- Total Maps: Number of maps currently included in this scope.
- Valid Maps: Number of maps with a usable predicted value and a usable expected value.
- MAE(Mean Absolute Error): Average size of prediction error, ignoring direction.
- RMSE(Root Mean Squared Error): Error measure that gives extra weight to large mistakes.
- Bias: Average signed error. Positive means predictions are usually lower than expected. Negative means predictions are usually higher than expected.
- Median|Delta|(Median Absolute Error): Middle absolute error after sorting from small to large.
- Coverage: Percentage of maps that produced valid numeric output.
- P90|Delta|(Ninetieth Percentile Absolute Error): Error level that covers ninety percent of maps.
- Maximum Underestimate: Largest positive difference between expected value and predicted value.
- Maximum Overestimate: Largest negative difference between expected value and predicted value.
- Expected And Predicted Fit: A trend fit percentage between expected values and predicted values. Higher means closer shape matching.
Accuracy Bands
Each map is grouped by absolute error size.
- Exact: Absolute error is at most 0.2.
- Close: Absolute error is above 0.2 and at most 0.5.
- Moderate: Absolute error is above 0.5 and at most 1.0.
- Miss: Absolute error is above 1.0.
- Clicking a band in the chart applies a table filter.
Charts
Charts help you understand error distribution and behavior patterns.
- Accuracy Breakdown: Shares of exact, close, moderate, and miss maps.
- Expected Versus Predicted Scatter: Each dot is one map. Closer to the diagonal reference line means better prediction.
- Error Distribution: Histogram of signed error values.
- Expected And Predicted Trend: Expected and predicted lines sorted by expected difficulty.
- Pattern Mean Absolute Error: Mean absolute error by pattern group. Click a bar to filter that pattern.
- Sub Pattern Mean Absolute Error: Mean absolute error by sub pattern group. Click a bar to filter that sub pattern.
- Head To Head: Compares absolute error of base algorithm and comparison algorithm on matched maps.
Benchmark Maps Table
Main table for valid benchmark rows.
- Name: Map name used by the benchmark source file.
- Expected: Target difficulty value from your benchmark source.
- Predicted: Predicted difficulty label. Hover shows numeric value.
- Difference: Expected value minus predicted value.
- Absolute Difference: Magnitude of difference.
- Pattern: Main pattern family.
- Sub Pattern: Detailed pattern tag.
- Band: Accuracy band for the row.
- Compare Predicted: Predicted value from comparison algorithm when comparison is enabled.
- Compare Absolute Difference: Absolute difference from comparison algorithm.
- Winner: Which algorithm has smaller absolute difference for this row.
Error Maps
Rows that cannot be used for numeric statistics.
- Invalid: Algorithm doesn't support the map.
- Failed: Runtime or parsing error happened during estimation.
- Missing: Map file could not be found by name and could not be downloaded by bid.
- Detail: Additional reason for the error status.