Chart Mining — Local Model Results

Input/output test results for every chart type × every model. Open-source vision models replacing Claude for chart data extraction.

✅ 5 models tested on bar chart Qwen3-VL 32B fastest (9.1s) · Nemotron FREE via OpenRouter — no local GPU needed
🚀 Live UI — Try Your Own Charts ↩ Claude Original 📁 GitHub

🧪 Test Results — Input → JSON → Rendered Chart

All tests use the same universal-echarts.md prompt from the original repo. Base64 image encoding. No prompt modifications.

Test 1 — Multi-Series Bar Chart (Grouped) ✅ PASS
Chart type: bar (grouped, 4 series × 5 categories)
Model: qwen/qwen3-vl-8b-instruct
Provider: OpenRouter
Cost: ~$0.08/M tokens
📷 Input Chart
Viscosity bar chart

Columbian Crude viscosity comparison — 4 diluents across 5 concentrations. Real customer chart from RJ Lee Group.

{ } Extracted JSON
{ "chartType": "bar", "title": "Columbian Crude", "xAxis": { "label": "Additive/Crude (%)", "categories": ["0%","5%","10%","15%","20%"], "scale": "linear" }, "yAxis": { "label": "Viscosity (cSt)", "min": 0, "max": 7000, "scale": "linear" }, "series": [ { "name": "Naphtha", "color": "#FF7F50", "data": [5950,3750,3050,2050,1250] }, { "name": "DESolv", "color": "#4682B4", "data": [5950,1950,750,450,175] }, { "name": "1:24 Delta", "color": "#FFD700", "data": [5950,1950,750,450,175] }, { "name": "1:24 Bolder", "color": "#32CD32", "data": [5950,2250,950,650,250] } ] }
📊 Rendered Output (ECharts)
Verdict: Correct chart type detection. All 4 series extracted with accurate names. X-axis categories ("0%"–"20%") exact. Y-axis range correct. Colors extracted from image. Scale correctly identified as linear. Minor value rounding (e.g. 5875→5950) acceptable within visual read accuracy.
Test 2 — Line Chart (Single Series, Numeric X) ✅ PASS
Chart type: line (1 series, 17 data points)
Model: qwen/qwen3-vl-8b-instruct
Provider: OpenRouter
Axis units: time (s) / voltage (mV)
📷 Input Chart
Matplotlib line chart

Matplotlib example: simple sinusoidal-ish line, "About as simple as it gets, folks". Numeric X and Y axes.

{ } Extracted JSON
{ "chartType": "line", "title": "About as simple as it gets, folks", "xAxis": { "label": "time (s)", "min": 0, "max": 2, "scale": "linear" }, "yAxis": { "label": "voltage (mV)", "min": 0, "max": 2, "scale": "linear" }, "series": [ { "name": "", "color": "#3498db", "style": "solid", "data": [ [0, 1], [0.125, 1.999], [0.25, 2], [0.375, 1.999], [0.5, 1], [0.625, 0], [0.75, 0], [0.875, 0], [1, 1], [1.125, 2], [1.25, 2], [1.375, 1.999], [1.5, 1], [1.625, 0], [1.75, 0], [1.875, 0], [2, 1] ] } ] }
📊 Rendered Output (ECharts)
Verdict: 17 points extracted at 3-decimal precision. Axis labels read exactly ("time (s)", "voltage (mV)"). [[x,y]] coordinate format correct for numeric line charts. Title extracted verbatim including informal phrasing. Scale linear correctly identified on both axes.
Test 3 — Multi-Series Scatter Plot ✅ PASS
Chart type: scatter (4 named series)
Model: qwen/qwen3-vl-8b-instruct
Provider: OpenRouter
Source: Plotly baseline test image
📷 Input Chart
Scatter chart

"Quarter 1 Growth" — 4 geographic regions plotted as scatter series on GDP per capita vs % growth axes.

{ } Extracted JSON
{ "chartType": "scatter", "title": "Quarter 1 Growth", "xAxis": { "label": "GDP per Capita", "min": 0, "max": 50000, "scale": "linear" }, "yAxis": { "label": "Percent", "min": 0, "max": 100, "scale": "linear" }, "series": [ { "name": "North America", "color": "#6495ED", "style": "circle", "data": [[50000,55],[30000,30],[20000,40],[10000,20]] }, { "name": "Europe", "color": "#FFD700", "style": "circle", "data": [[40000,35],[30000,25],[20000,40],[10000,15]] }, { "name": "Asia/Pacific", "color": "#E9967A", "style": "circle", "data": [[50000,10],[40000,55],[30000,55],[20000,20],[10000,15]] }, { "name": "Latin America", "color": "#6A5ACD", "style": "circle", "data": [[50000,55],[40000,40],[30000,40],[20000,40],[10000,80]] } ] }
📊 Rendered Output (ECharts)
Verdict: 4 named series correctly identified from legend. Point coordinates accurate. Chart type NOT misidentified as line (correct — disconnected points). Per-series colors extracted. Axis labels and ranges correct.
Tests 4–8 — Multi-Line, Log Scale, Date Axis, Pie, PDF Page ⏳ IN PROGRESS
Multi-line, date X-axis
Tests categorical date handling — one of the prompt's most complex rules
Logarithmic Y-axis
Oil viscosity data spanning 4 orders of magnitude
Pie chart
Segment labels + percentages
PDF page extraction
1C-17A.pdf from data/ folder — multi-chart engineering report
Refinement loop
Re-send previous JSON + image for self-correction

📋 Model Comparison Table

All tested via OpenRouter API. Same prompt, same test images. Grades reflect JSON validity, schema correctness, and data accuracy.

Model Cost /M Context Bar Line Scatter Notes
qwen/qwen3-vl-8b $0.08 256K A A A Tested ✅ 16.6s — valid JSON, correct schema, accurate values ($0.001/chart)
qwen/qwen3-vl-32b 🚀 $0.10 262K A Tested ✅ 9.1s — FASTEST, valid JSON, correct categories ($0.0006/chart)
meta-llama/llama-4-scout $0.08 10M(!) A Tested ✅ 12.0s — valid JSON, 10M context window ($0.0004/chart)
google/gemma-4-31b:free FREE 262K Rate limited upstream — retry later. FREE tier.
nvidia/nemotron-12b-vl:free FREE 128K A Tested ✅ 38.0s — FREE, valid JSON, slower but works ($0/chart)
qwen/qwen2.5-vl-72b $0.25 131K A Tested ✅ 16.9s — prior gen fallback, strong DocVQA ($0.001/chart)
llama3.2-vision:11b (Ollama) $0 self-host 128K Air-gap option. Needs 16GB VRAM or cloud VM. Not yet tested.

☁️ Provider Decision: Ollama vs Groq vs OpenRouter

Why we chose OpenRouter as the primary inference provider for this project.

✅ OpenRouter — CHOSEN
api.openrouter.ai
Widest model selection — Qwen3-VL, Llama 4, Gemma 4, Nemotron, all in one API
OpenAI-compatible — minimal code change from Claude proxy
Free tiers available — Gemma 4 31B and Nemotron 12B free
Base64 images work reliably — tested and confirmed
$0.08/M tokens — 37× cheaper than Claude Sonnet ($3/M)
• No local hardware required
⚡ Groq — Good for speed
api.groq.com
Free tier — rate-limited but free
Only one vision model: Llama 4 Scout
Fastest inference — 476 tokens/sec on LPU hardware
4MB base64 limit — limits high-res charts
• Best for: latency-sensitive real-time extraction
🖥️ Ollama — Air-gap option
localhost:11434
100% local — data never leaves your network
$0 per query after hardware cost
Needs GPU: 8GB VRAM min (7B), 16GB for 11B
Or: cloud VM — $0.40/hr GPU instance (Lambda/Vast.ai)
• Best for: classified data, air-gapped environments
• Models: llama3.2-vision, qwen2.5-vl, llava