Systematic validation of extraction accuracy: generate charts from KNOWN data, extract via each model, compare extracted values to ground truth. This is the only valid way to measure extraction accuracy.
Average accuracy across all test cases (bar, multi-line, log-scale). Based on MAPE comparison to known ground truth data.
Generated from exact values — we know precisely what each bar should be.
| Model | Product A | Product B | Product C | Accuracy |
|---|---|---|---|---|
| Ground Truth | [120, 145, 132, 168] | [98, 112, 125, 142] | [85, 92, 88, 95] | — |
| Qwen3-VL 32B | [120, 145, 132, 168] | [98, 112, 125, 142] | [85, 92, 88, 95] | 99.9% |
| Qwen3-VL 8B | [119, 145, 132, 168] | [98, 112, 125, 142] | [85, 92, 88, 95] | 99.6% |
| Nemotron 12B | [120, 145, 132, 168] | [98, 112, 125, 142] | [85, 92, 88, 95] | 100% |
| Llama 4 Scout | ❌ JSON parse failed | 0% | ||
| Model | Treatment A Values | Accuracy |
|---|---|---|
| Ground Truth | [100, 85, 72, 61, 53, 47, 42] | — |
| Qwen3-VL 32B | [100, 85, 72, 61, 53, 47, 42] | 99.8% |
| Qwen3-VL 8B | [100, 85, 72, 61, 53, 47, 42] | 99.9% |
| Nemotron 12B | [100, 85, 72, 61, 53, 47, 42] | 100% |
| Llama 4 Scout | [100, 85, 70, 60, 52, 46, 41] | 92.5% |
| Model | Viscosity Values | Accuracy |
|---|---|---|
| Ground Truth | [10000, 1000, 100, 10, 1] | — |
| Qwen3-VL 32B | [10000, 1000, 100, 10, 1] | 100% |
| Qwen3-VL 8B | [10000, 1000, 100, 10, 1] | 100% |
| Nemotron 12B | [10000, 1000, 100, 10, 1] | 100% |
| Llama 4 Scout | [10000, 1000, 100, 10] (4 pts) | 100%* |
* Llama 4 extracted 4/5 points correctly but missed the last value.
Qwen3-VL 32B is the recommended replacement for Claude in chart extraction workflows:
For zero-cost deployments, NVIDIA Nemotron 12B VL achieves 100% accuracy via OpenRouter's free tier (slower but no API costs).