Automated Global Catalog Audit System

Role Solo — Design, Engineering, Deploy

Stack Python · Pandas · openpyxl

Scale 12 Countries · Thousands of SKUs

Impact 16 hrs → under 5 min (99% reduction)

Every two weeks, someone on the team spent 16 hours manually verifying thousands of SKUs across 12 countries. I spent two evenings writing a Python script. That script has never called in sick.

16h → 5m

Audit time

Countries covered

99%

Time saved

Human errors

01 · Context

The Problem

At ATHIBAN Ecommerce, the bi-weekly global catalog audit was one of the most critical — and most painful — tasks in the workflow. An analyst would open twelve different country-specific data exports, verify thousands of SKUs against a growing list of business rules (title length, image count, required attributes, pricing logic, keyword compliance), flag every violation, format a per-country issue report, and hand it off to the operations team.

The process had three compounding problems:

Time. 16 hours per cycle, every two weeks. That's a full two working days per month consumed by a single repeatable task.
Error rate. At the scale of thousands of SKUs across twelve formats, human attention drifts. Issues were missed. Country-specific rules were misapplied. Reports had inconsistencies.
Scalability wall. As the catalog grew and we added new markets, the audit time grew with it. More countries meant more hours — the process was heading toward a dedicated headcount just to sustain it.

The moment I decided to build this

I watched a colleague spend 14 hours on an audit, only to discover the next day that three of the country exports had been updated — making half the work obsolete. That was the last time this was done manually.

02 · Goal

The Design Goals

Full automation. Zero manual steps between "run the script" and "here's your audit report."
Multi-country normalization. One script handles twelve different export formats without configuration changes per run.
Rule-based validation. Business rules should be configurable — not hardcoded — so the ops team can update them without touching code.
Actionable output. Reports should be ready to act on, not a list of raw errors. Country-specific files, pre-formatted for the operations workflow.
Run anywhere. No dependencies beyond Python and a data folder. Any team member should be able to trigger it.

03 · How I Built It

The Pipeline

The script runs as a single pipeline with three sequential stages:

📥 Stage 1 · Ingestion

Reads all 12 country export files from one input folder
Normalizes headers, encodings, and data types per country schema
Merges into a single validated master DataFrame

⚙️ Stage 2 · Validation

Applies configurable business rules per attribute per country
Checks title length, image requirements, mandatory fields, pricing logic, keyword compliance
Flags each violation with rule code + severity level

📤 Stage 3 · Reporting

Generates per-country issue files in ops-ready format
Produces a summary dashboard with pass/fail rates per country
Isolates critical violations (suppression risks) in their own file

Pipeline at a Glance

Input

📁 12 Country Export Files

📋 Business Rules Config

↓

Process

Pandas Normalization

Rule Engine

Violation Flagging

↓

Output

Country Issue Files

Audit Summary Dashboard

Critical Violation Report

04 · Impact

Before vs. After

Audit time per cycle

Before

16 hours

After

Under 5 minutes

Human error rate

Before

Significant (scale-dependent)

After

Zero

Scalability

Before

Linear — more markets, more hours

After

Flat — 1 or 20 countries, same 5 min

Audit cadence

Before

Bi-weekly (too expensive to run more)

After

On demand — before every major update

The script ran for the first time on a Wednesday evening and finished in 4 minutes and 38 seconds. The audit it replaced would have taken the rest of the week.

— First production run, ATHIBAN Ecommerce catalog audit

05 · What I Learned

Hard-Won Lessons

Config-driven rules beat hardcoded logic. The first version baked business rules into the code. When rules changed (they always do), updates required a developer. Moving rules to a config file let the ops team manage them independently.
Output format matters as much as detection. Finding the violation is half the job. The ops team needs to act on it — so the report format has to match how they work, not how the data is stored.
Automation ROI is immediate when the manual process is painful enough. This script paid for its build time on the very first run. Some automations take months to justify; this one repaid the investment in hours.
Once you automate it, you run it more often. We went from bi-weekly audits (too expensive to run more often) to running on demand, before every major catalog update. The frequency gain was as valuable as the time saving.

📦 View on GitHub 💬 Discuss This Project

Running a manual catalog audit process? I can automate it. Get in touch →

Automated Global Catalog Audit System

The Problem

The Design Goals

The Pipeline

📥 Stage 1 · Ingestion

⚙️ Stage 2 · Validation

📤 Stage 3 · Reporting

Pipeline at a Glance

Before vs. After

Hard-Won Lessons

Need an analyst who can eliminate the manual work?