Solo — Design, Engineering, Deploy
Python · Pandas · openpyxl
12 Countries · Thousands of SKUs
16 hrs → under 5 min (99% reduction)

Every two weeks, someone on the team spent 16 hours manually verifying thousands of SKUs across 12 countries. I spent two evenings writing a Python script. That script has never called in sick.

16h → 5m
Audit time
12
Countries covered
99%
Time saved
0
Human errors

The Problem

At ATHIBAN Ecommerce, the bi-weekly global catalog audit was one of the most critical — and most painful — tasks in the workflow. An analyst would open twelve different country-specific data exports, verify thousands of SKUs against a growing list of business rules (title length, image count, required attributes, pricing logic, keyword compliance), flag every violation, format a per-country issue report, and hand it off to the operations team.

The process had three compounding problems:

  • Time. 16 hours per cycle, every two weeks. That's a full two working days per month consumed by a single repeatable task.
  • Error rate. At the scale of thousands of SKUs across twelve formats, human attention drifts. Issues were missed. Country-specific rules were misapplied. Reports had inconsistencies.
  • Scalability wall. As the catalog grew and we added new markets, the audit time grew with it. More countries meant more hours — the process was heading toward a dedicated headcount just to sustain it.
The moment I decided to build this

I watched a colleague spend 14 hours on an audit, only to discover the next day that three of the country exports had been updated — making half the work obsolete. That was the last time this was done manually.

The Design Goals

  • Full automation. Zero manual steps between "run the script" and "here's your audit report."
  • Multi-country normalization. One script handles twelve different export formats without configuration changes per run.
  • Rule-based validation. Business rules should be configurable — not hardcoded — so the ops team can update them without touching code.
  • Actionable output. Reports should be ready to act on, not a list of raw errors. Country-specific files, pre-formatted for the operations workflow.
  • Run anywhere. No dependencies beyond Python and a data folder. Any team member should be able to trigger it.

The Pipeline

The script runs as a single pipeline with three sequential stages:

📥 Stage 1 · Ingestion

  • Reads all 12 country export files from one input folder
  • Normalizes headers, encodings, and data types per country schema
  • Merges into a single validated master DataFrame

⚙️ Stage 2 · Validation

  • Applies configurable business rules per attribute per country
  • Checks title length, image requirements, mandatory fields, pricing logic, keyword compliance
  • Flags each violation with rule code + severity level

📤 Stage 3 · Reporting

  • Generates per-country issue files in ops-ready format
  • Produces a summary dashboard with pass/fail rates per country
  • Isolates critical violations (suppression risks) in their own file

Pipeline at a Glance

Input
📁 12 Country Export Files
📋 Business Rules Config
Process
Pandas Normalization
Rule Engine
Violation Flagging
Output
Country Issue Files
Audit Summary Dashboard
Critical Violation Report

Before vs. After

Audit time per cycle
Before
16 hours
After
Under 5 minutes
Human error rate
Before
Significant (scale-dependent)
After
Zero
Scalability
Before
Linear — more markets, more hours
After
Flat — 1 or 20 countries, same 5 min
Audit cadence
Before
Bi-weekly (too expensive to run more)
After
On demand — before every major update

The script ran for the first time on a Wednesday evening and finished in 4 minutes and 38 seconds. The audit it replaced would have taken the rest of the week.

— First production run, ATHIBAN Ecommerce catalog audit

Hard-Won Lessons

  • Config-driven rules beat hardcoded logic. The first version baked business rules into the code. When rules changed (they always do), updates required a developer. Moving rules to a config file let the ops team manage them independently.
  • Output format matters as much as detection. Finding the violation is half the job. The ops team needs to act on it — so the report format has to match how they work, not how the data is stored.
  • Automation ROI is immediate when the manual process is painful enough. This script paid for its build time on the very first run. Some automations take months to justify; this one repaid the investment in hours.
  • Once you automate it, you run it more often. We went from bi-weekly audits (too expensive to run more often) to running on demand, before every major catalog update. The frequency gain was as valuable as the time saving.

Running a manual catalog audit process? I can automate it. Get in touch →

Need an analyst who can eliminate the manual work?

This automation is one of several tools I've built to replace time-consuming e-commerce workflows.

Let's Talk → See Other Projects