ALMA PO Extraction

7-stage VLM pipeline for ALMA Componentes Purchase Order extraction. Upload any PO — Pedido / Comanda / Commande in ES, CA, or FR — get structured JSON, Snake matches, and a trust score in seconds.

POST /extract

Unified extraction — upload PDF, PNG, JPG. Auto-converts to PDF, runs the full pipeline. Returns task_id for async polling.

POST /stage_0

Fast sync identification — customer ID (LABI, Macrisal, Thermia, Maldonado, Q. Angles, Aluminios Barcelona…), CIF, PO number, language. Regex tier (~5ms) with Haiku VLM fallback (~1s).

GET /extract/{id}

Poll extraction result. Returns full structured PO data with matching, validation flags, and routing decision.

Pipeline

0
Client ID — regex + Haiku, customer + CIF + language (es/ca/fr) in ~5ms
1
Document Analyzer — classify Pedido/Comanda/Commande layout, skip LOPD/RGPD
2
Unified Extractor — Sonnet VLM, EU decimal parsing (1.309,52 → 1309.52)
3
Rules Engine — ISO dates, IVA 21%, base imponible arithmetic
4
Snake Matcher — AUDAX / pasadores / bisagras / manillas against ALMA basis
5
Validation — Haiku cross-checks extraction vs original PDF
6
Router — auto-approve / human review based on trust score