Tariff-Aware Purchase Order Extraction v0.2

Charles Dana · Monce AI · April 2026

alma.aws.monce.ai · Internal technical note

Abstract

ALMA Componentes receives Purchase Orders where the Pr.tarifa, Dto, and IMPORTE columns are systematically empty — the customer sends only quantities and expects ALMA to price the order from its master tariff. We formalize the pricing step as a lookup and aggregation problem over a two-table cross-reference graph and prove that the output is deterministic, auditable, and invariant to input-naming drift.

1. Problem statement

Let a Purchase Order be a set of n line items

L = { ℓ1, ℓ2, …, ℓn }

where each ℓi = (pi, ri, qi, ui, di) with

Stage 4.5 must return a pricing assignment π : L → ℝ≥0 × ℝ≥0 mapping each line to (unit_price, line_total). When ui ≠ ⊥ the extraction honors it; when ui = ⊥ we must derive a price.

2. The tariff graph

Let T = TGPAO ∪ TAXALYS be the disjoint union of the two tariff tables. TGPAO keys the 155 internal ERP codes (T01A1XX1, …), TAXALYS keys the 158 customer-facing refs (07MA1P250, 11A1, …). Each node carries a canonical price ρ(t) ∈ ℝ>0.

An alias function A : Σ* → Σ* ∪ {⊥} maps customer-visible names to tariff keys (e.g. A(AUDAX1UCR6ZK00) = 07MA1P250). A is a partial function; currently |dom(A)| = 14 for the AUDAX family.

3. Lookup cascade

Given a candidate string s, define Λ : Σ* → T ∪ {⊥} as the first-match cascade:

Λ(s) = A(s) if A(s) ≠ ⊥                                 (alias)
     s   if s ∈ T                                             (exact)
     t   if ∃ t ∈ T : norm(t) = norm(s)                        (normalized)
     t   if ∃ t ∈ T : norm(s).startswith(norm(t)) ∧ k ≥ 6    (prefix)
     ⊥   otherwise

where norm(x) = x.strip("-_ ").upper() and k = min(|norm(s)|, |norm(t)|). The cascade terminates in one of five cases, exactly one of which applies per input.

4. Pricing assignment

For a line ℓi = (pi, ri, qi, ui, di), set

ti = Λ(pi) if Λ(pi) ≠ ⊥ else Λ(ri)

Then the pricing assignment is

π(ℓi) = ❨ ρ(ti), qi · ρ(ti) · (1 − di/100) ❩ when ui = ⊥
π(ℓi) = ❨ ui, qi · ui · (1 − di/100) ❩ when ui ≠ ⊥

In the latter case, if |ui − ρ(ti)| / ρ(ti) ≥ τ we tag the line with a divergence flag. The default threshold τ = 0.10.

5. Soundness

Determinism. Λ is deterministic: each clause of the cascade either fires or not, and earlier clauses dominate. Two queries with the same input always return the same tariff row.

Auditability. Each priced line carries the tuple (tariff_ref, tariff_price, tariff_source), so Isabel can reconstruct ρ(ti) from the original XLSX at any time.

Naming invariance. Adding an alias entry to A reclassifies every historical ⊥-line whose p or r matched s, without re-extracting the PDF. This is immediate because A is read at import, not baked into the model.

6. Complexity

Per-line lookup: O(|T|) worst case, O(1) for the exact and alias clauses (dict lookup). Normalized and prefix clauses are O(|T| · k); for |T| = 313 and k ≤ 20 this is < 50 μs per line on the production EC2. Full-PO overhead is dominated by the VLM stages 0–2, so pricing enrichment is essentially free.

7. Worked example

Centroalum PO PC26-001203, line 3:

p = "CI450968"              # Centroalum internal
r = "AUDAX1UCR6ZK00"         # ALMA/AXALYS ref
q = 60, u = ⊥, d = 0
Λ(p) = ⊥          (not in T)
Λ(r) = A("AUDAX1UCR6ZK00")
       = "07MA1P250"     (alias hit)
ρ("07MA1P250") = 6.08 EUR   (axalys:Audax ALU UCR)
π = (6.08, 60 × 6.08 × 1.00) = (6.08, 364.80)

This is exactly the number shown in /ui today.

8. Limits & open problems