Every two weeks, someone on the team spent 16 hours manually verifying thousands of SKUs
across 12 countries. I spent two evenings writing a Python script. That script has never
called in sick.
01 · Context
The Problem
At ATHIBAN Ecommerce, the bi-weekly global catalog audit was one of
the most critical — and most painful — tasks in the workflow. An analyst would open twelve
different country-specific data exports, verify thousands of SKUs against a growing list of
business rules (title length, image count, required attributes, pricing logic, keyword
compliance), flag every violation, format a per-country issue report, and hand it off to the
operations team.
The process had three compounding problems:
- Time. 16 hours per cycle, every two weeks. That's a full two working days per
month consumed by a single repeatable task.
- Error rate. At the scale of thousands of SKUs across twelve formats, human
attention drifts. Issues were missed. Country-specific rules were misapplied. Reports had
inconsistencies.
- Scalability wall. As the catalog grew and we added new markets, the audit time
grew with it. More countries meant more hours — the process was heading toward a dedicated
headcount just to sustain it.
The moment I decided to build this
I watched a colleague spend 14 hours on an audit, only to discover the next day that three of
the country exports had been updated — making half the work obsolete. That was the
last time this was done manually.
02 · Goal
The Design Goals
- Full automation. Zero manual steps between "run the script" and "here's your
audit report."
- Multi-country normalization. One script handles twelve different export
formats without configuration changes per run.
- Rule-based validation. Business rules should be configurable — not hardcoded —
so the ops team can update them without touching code.
- Actionable output. Reports should be ready to act on, not a list of raw
errors. Country-specific files, pre-formatted for the operations workflow.
- Run anywhere. No dependencies beyond Python and a data folder. Any team member
should be able to trigger it.
03 · How I Built It
The Pipeline
The script runs as a single pipeline with three sequential stages:
📥 Stage 1 · Ingestion
- Reads all 12 country export files from one input folder
- Normalizes headers, encodings, and data types per country schema
- Merges into a single validated master DataFrame
⚙️ Stage 2 · Validation
- Applies configurable business rules per attribute per country
- Checks title length, image requirements, mandatory fields, pricing logic, keyword
compliance
- Flags each violation with rule code + severity level
📤 Stage 3 · Reporting
- Generates per-country issue files in ops-ready format
- Produces a summary dashboard with pass/fail rates per country
- Isolates critical violations (suppression risks) in their own file
Pipeline at a Glance
Input
📁 12 Country Export Files
📋 Business Rules Config
↓
Process
Pandas Normalization
Rule Engine
Violation Flagging
↓
Output
Country Issue Files
Audit Summary Dashboard
Critical Violation Report
04 · Impact
Before vs. After
Human error rate
Before
Significant (scale-dependent)
Scalability
Before
Linear — more markets, more hours
After
Flat — 1 or 20 countries, same 5 min
Audit cadence
Before
Bi-weekly (too expensive to run more)
After
On demand — before every major update
The script ran for the first time on a Wednesday evening and finished in 4 minutes and 38
seconds. The audit it replaced would have taken the rest of the week.
— First production run, ATHIBAN Ecommerce catalog audit
05 · What I Learned
Hard-Won Lessons
- Config-driven rules beat hardcoded logic. The first version baked business
rules into the code. When rules changed (they always do), updates required a developer.
Moving rules to a config file let the ops team manage them independently.
- Output format matters as much as detection. Finding the violation is half the
job. The ops team needs to act on it — so the report format has to match how they work, not
how the data is stored.
- Automation ROI is immediate when the manual process is painful enough. This
script paid for its build time on the very first run. Some automations take months to justify;
this one repaid the investment in hours.
- Once you automate it, you run it more often. We went from bi-weekly audits
(too expensive to run more often) to running on demand, before every major catalog update. The
frequency gain was as valuable as the time saving.
Running a manual catalog audit process? I can automate it. Get in touch →