Tayler Erbe · Project Case Study · Procurement Analytics · 2025

Cross-Distributor Procurement
Intelligence System

An end-to-end AI system for identifying cost-saving opportunities across multi-vendor purchasing data — combining deterministic exact matching, FAISS semantic retrieval over 5.19 million vectors, and LLM equivalence reasoning with pack-size normalization across six million catalog rows from four distributors.

View on GitHub →
Domain
Procurement Analytics · Multi-Vendor Data
Status
Prototype Delivered · Demo-Ready
Role
Data Scientist · Full Lifecycle Ownership
Scale
6,068,888 catalog rows · 4 distributors
6M+
Catalog Rows Processed
42,052
Exact-Matched Product Pairs
$161K
Defensible Unit-Normalized Savings
49.1%
Pack-Size Inflation Removed

The Problem

Large organizations that purchase from multiple vendors face a structural data problem: vendor A describes a product one way, vendor B describes the same product differently, quantities are in inconsistent units, and pack sizes are buried inside product codes. No common identifier exists to join across sources. Nobody has been systematically comparing prices.

The instinct when confronted with this is to reach for a visualization tool or drop everything into an AI assistant and ask it to find patterns. Both approaches fail for the same reason: the data isn't ready. You cannot virtualize your way out of a data quality problem.

This project was scoped and built to solve that problem end-to-end — from raw multi-vendor exports through a three-tier AI pipeline to a live price comparison tool and spending dashboard, deployable to Azure for use by researchers and procurement staff.

Core Question

Which products are the same item sold at different prices by different vendors? And once you know that — what is the cheapest source, per unit, for each product?

The Data Reality

Four distributors. Six million catalog rows. Two had prior-year purchase quantities; two were catalog-only. Part numbers sparsely populated. Pack sizes encoded in product descriptions or manufacturer codes. No clean join key across any source.

Core Design Principle

Lead with what is deterministic. Do everything you can without AI before using AI. Exact matching is faster, cheaper, and more reliable than inference — and it produces the most defensible savings estimates for building a business case.

The Unit Normalization Problem

The single biggest data quality challenge in any multi-vendor procurement dataset is pack size inconsistency. A product listed at $317 from one vendor and $14 from another looks like a massive savings opportunity. In reality, the $317 listing is for a case of 500 and the $14 listing is for a pack of 50. The per-unit price is nearly identical. Acting on the raw comparison would be wrong — and embarrassing to present to leadership.

The naive fix is to parse pack sizes from descriptions using regex. This works for well-formatted descriptions but fails on terse vendor abbreviations, which represent the majority of rows.

The real fix came from the source data itself. The procurement platform's export contained a field called qsu ("quantity sold as") with values like 1, 50, 100, 500, 1000 — populated for 99.9% of rows (6,068,599 of 6,068,888). Dividing catalog price by qsu produces the true per-unit price for every row.

Impact of Normalization

23% of rows (1,395,917) required normalization (qsu > 1).

Raw savings estimate: $317,583
After unit-price normalization: $161,564
Pack-size inflation removed: $156,019 (49.1%)

The Lesson

Before building any comparison logic, find out if your data source has a quantity or pack-size field. It is often there, just not used. In this case, that single field transformed the entire analysis from unreliable to defensible — and changed the headline figure by nearly half.

Three-Tier Comparison Pipeline

Tier 1 — Exact Match
Always runs · No AI
MechanismDict lookup: (mfr, part#)
Latency< 2ms
Matched pairs42,052
Confidence100% — identical by definition
Requires LLMNo

Same manufacturer + same part number at a different vendor = identical product. Price difference is pure savings. This tier alone builds the business case — no AI required, no ambiguity.

Tier 2 — Semantic Retrieval
FAISS · bge-small-en-v1.5
MechanismCosine similarity ≥ 0.85
Latency75–110ms
Index size5.19M vectors · 8GB
Median candidates44 per query
PurposeNarrows 6M → ~50

The embedding model handles scale; the LLM handles reasoning. You never want AI searching 6M rows. The funnel narrows candidates to ~50 before any inference runs.

Tier 3 — LLM Reasoning
qwen2.5:7b · Ollama
MechanismEquivalence + unit norm
Latency5–8 seconds
Input~50 candidates max
Eval score4/4 correctness
Degrades gracefullyYes — Tier 1 still runs

Extracts baseline unit price, checks product type/size/specs/concentration, normalizes pack sizes, rejects accessories. Returns structured JSON; savings calculated deterministically from output.

The LLM's one job is equivalence judgment — not price comparison. Price comparison is arithmetic. Do it in code after the AI runs. Keep the AI's scope narrow, give it explicit rules, and validate its output with spot checks.

Build Sequence & Methodology

01
Understand your data sources before writing any code
Which vendors have purchase history (quantities)? Which are catalog-only? Is there a quantity/pack-size field — anywhere? What is the field completeness per vendor per column? Run value counts on everything. The answers fundamentally change what is possible downstream. In this project, discovering the qsu field changed the headline savings figure by nearly half.
02
Extract the deterministic signal first
Build the exact-match lookup on manufacturer + part number. Apply unit normalization using the pack-size field. Count how many matches exist across vendors. Calculate savings on matched pairs with known quantities. This is your business case. Present it before building anything else. It is also the trust anchor — if exact matching produces defensible numbers, stakeholders will believe the AI-assisted numbers that follow.
03
Build the semantic layer offline, once
Choose an embedding model calibrated for your domain vocabulary. Build the FAISS index offline and reuse it — rebuilding is only necessary if product descriptions change significantly, not just prices. Calibrate the cosine threshold empirically on 20–30 hand-picked queries: enough candidates to find alternatives, not so many the LLM gets confused. This project used cos ≥ 0.85, yielding a median of 44 candidates and max of 137.
04
Design the LLM prompt for your domain's equivalence rules
The prompt must be explicit about what makes two products equivalent in your specific context. Include unit normalization logic directly in the prompt: extract pack size, compute unit price, compare unit prices. List explicit rejection criteria — accessories ≠ products, different sizes ≠ equivalent, different concentrations ≠ equivalent. Ask for structured JSON output; parse it deterministically. Cache LLM responses by prompt hash so repeat queries are free.
05
Build the user interface around selection, not search
The hardest UX problem in procurement tools is anchoring savings to a real number. The critical design decision: make the user select their current product as the baseline before running AI. This eliminates pack-size ambiguity, produces results the user can defend to their organization, and makes the comparison actionable. Provide three paths: natural language search, browse by category, and browse by manufacturer. Every result should be downloadable as a CSV for independent verification.
06
Build the trends layer separately from the comparison tool
The comparison tool answers "is there a cheaper option for this specific product?" The trends dashboard answers "where should we be looking for savings across all of our purchasing?" These are different questions requiring different interfaces. The trends layer uses the pre-computed unit-normalized savings table — no AI required, fully deterministic, loads instantly. Embed the insight in the dashboard: not just "here are the numbers" but "here is what you should look at first and why."

Results & Key Metrics

6,068,888
Catalog rows processed
42,052
Exact-matched product pairs
$161,564
Unit-normalized savings estimate
23%
Rows requiring unit normalization
Price Spread Distribution

Median unit-price spread across all 42,052 matched pairs: 20.7%. Mean: 30.1%. 90th percentile: 95% — meaning 10% of matched products have a price gap where one distributor charges nearly double the other for the same item.

Savings Methodology

The $161,564 figure covers only products where prior-year purchase quantities were available from two of four distributors. An additional 37,535 matched pairs have confirmed price gaps but unknown purchase volume — the true savings potential is larger and unlocked by obtaining quantity data from the remaining vendors.

System Performance

Exact match: <2ms. Semantic retrieval: 75–110ms. Full AI pipeline: 5–8 seconds. LLM evaluated at 4/4 correctness and 4/4 consistency on a 30-query hand-curated eval set. Graceful degradation: Tier 1 always runs when LLM is offline.

Raw vs. Defensible Figure

Raw savings estimate before normalization: $317,583. After applying unit normalization: $161,564. The $156,019 difference was entirely pack-size inflation — identical products listed at different pack quantities making cheaper-per-pack options appear more expensive per unit.

The $161K is the defensible floor, not the ceiling. The single highest-impact next step is obtaining prior-year purchase quantities from the two catalog-only distributors. That one data request could multiply the confirmed savings figure several times over.

Key Lessons

Lesson What it means in practice This project
Find the qty field first Every multi-vendor dataset has a pack-size or quantity field with high coverage that sits unused. Find it, use it, normalize on it before anything else. qsu field — 99.9% coverage — changed the headline savings figure by 49%
Funnel architecture is non-negotiable You cannot run LLM inference on 6M rows. You can run it on 50. Design so that by the time a record reaches AI, it has been through at least two filtering steps. FAISS narrows 6M → ~44 median candidates before any LLM call
Exact matching and semantic matching are complementary Exact matching is fast, reliable, zero AI. Use it first. Semantic matching covers what exact matching misses. Never let semantic replace exact — the false positive rate is too high for a system people will act on. Tier 1 runs in 2ms, always, even when LLM is offline
Normalize before presenting savings A $300K savings figure that is 49% pack-size inflation is not a savings figure — it is a credibility problem. Document the methodology and show the corrected number. $317K raw → $161K normalized, methodology fully documented
AI's job is equivalence judgment, not arithmetic Price comparison is arithmetic — do it in code after the AI runs. The LLM's one job: look at ~50 product descriptions and say which ones are truly the same product as the baseline. Savings calculated deterministically from LLM's structured JSON output
Build for graceful degradation When the LLM is offline, exact-match results should still appear. When the vector index is cold, browse-by-category should still work. A tool that returns nothing when one component fails will not be used. Three independent tiers, each fully functional without the others

Technology Stack

Data & Search

Python · pandas · numpy · pyarrow · Parquet · FAISS IVFFlat · bge-small-en-v1.5 (384-dim embeddings) · Sentence-Transformers

FAISS Parquet bge-small-en pandas
AI & API

qwen2.5:7b via Ollama · FastAPI · Pydantic · uvicorn · JSON-structured LLM output · LLM response caching by prompt hash

qwen2.5:7b FastAPI Ollama Pydantic
Client & Deployment

Vanilla HTML/JS (no framework dependency) · works as a local file or deployed to Azure App Service B3 · Blob Storage for index files · ~$77/month estimated production cost

Azure App Service Blob Storage HTML/JS
← Back to Portfolio   ·   View on GitHub →