Blog /

How We Sync 50,000 Products Every Hour Without Breaking a Store

A technical deep-dive into the architecture behind Mr. Banny — a WooCommerce store that synchronizes its entire product catalogue with a wholesale warehouse every hour, while keeping page loads under 2 seconds.

MGKNeT

WordPress & E-commerce Experts

Most WooCommerce integrations are an afterthought. A plugin is installed, a CSV is imported weekly, and everyone hopes the prices roughly match what the warehouse actually has. It works until it doesn’t — and when it stops working, you’re selling products at prices you don’t offer from stock you don’t have.

Mr. Banny couldn’t operate that way. They sell home and outdoor furniture sourced from a single major wholesale supplier. Their entire catalogue — over 50,000 products — comes from that supplier. Prices change. Stock levels fluctuate. New products arrive. Discontinued lines need to disappear.

They needed the store to reflect reality, not yesterday’s reality. Which meant syncing every hour.

Here’s how we built it.

Why the Obvious Approaches Don’t Work

The first instinct when hearing “import 50,000 products” is to reach for WooCommerce’s built-in importer or a plugin like WP All Import. Run a CSV import every hour, done.

This doesn’t work at scale for several reasons.

It processes everything every time. A naive import reads every row, looks up the corresponding product in the database, and updates it regardless of whether anything changed. 50,000 database writes per hour, for data that’s mostly identical to what was already there. On a live store under real traffic, this saturates your database connections and makes the site slow or unresponsive during the sync window.

It doesn’t handle deletions gracefully. If a product disappears from the supplier’s catalogue, an import that only processes what’s in the current file will leave the old product live indefinitely — showing as in-stock when it’s no longer available.

It has no error recovery. When an import of 50,000 rows fails at row 30,000 due to a malformed record or a brief database timeout, most importers either stop entirely or produce inconsistent partial state. You need to know what failed, why, and what was left incomplete.

It runs as a single blocking process. A PHP import script that runs for 20-40 minutes is a ticking time bomb on any shared or underpowered hosting. Timeouts, memory limits, and competing traffic all threaten it.

We needed something designed differently from the ground up.

The Architecture

The system has three layers: fetch, diff, and apply.

Layer 1: Fetch

Every hour, a scheduled background process connects to the supplier’s API and retrieves the current catalogue. We download this to a temporary staging table in the database — not the live WooCommerce tables. The live store is untouched at this stage.

The staging table is a simplified flat structure: supplier SKU, price, stock quantity, name, description, category path, and a hash of all fields combined. That hash is the key to the next layer.

If the API call fails — network timeout, supplier downtime, malformed response — the process logs the failure and exits cleanly. The previous data in the staging table remains untouched. The live store keeps serving from whatever was there before. We get an alert. Nothing breaks.

Layer 2: Diff

This is the part that makes the system fast.

Instead of processing all 50,000 products on every run, we compare the hash for each supplier SKU against the hash we stored from the last successful sync. Only rows where the hash has changed need any action.

On a typical hourly run, this might be 200-800 changed products out of 50,000. Occasionally more if there’s a bulk price update. But rarely the full catalogue.

We also identify three categories of change:

  • Updates: products that exist in WooCommerce and have changed data in the supplier feed
  • New products: SKUs present in the supplier feed that don’t yet exist in WooCommerce
  • Removals: WooCommerce products linked to this supplier that are no longer present in the feed

Processing 500 changes takes seconds. Processing 50,000 takes minutes and hammers the database. The diff step is what makes hourly syncing viable.

Layer 3: Apply

Changes are applied as a queue of small, discrete operations — not a single monolithic process.

Each product update is an individual task. We use WordPress’s Action Scheduler (the same background job library WooCommerce uses for its own async operations) to distribute the work. Instead of doing everything in one PHP request that could time out, we’re dispatching hundreds of small jobs that each do one thing: update a single product.

This has several advantages:

Resilience. If one job fails — a database deadlock, a momentary timeout — it retries automatically. The other 499 jobs are unaffected. Failed jobs are logged with the exact error so we can investigate specific products if needed.

Database breathing room. Instead of hammering the database with 500 concurrent writes, jobs process sequentially with small gaps. The database is busy, but not saturated. Real user traffic keeps responding normally.

Visibility. We can see exactly how many jobs are pending, running, and complete. If a sync falls behind for any reason, the queue depth tells us immediately.

Handling the Complexity in the Data

The supplier’s data doesn’t arrive in a format that maps cleanly to WooCommerce. Several translation layers sit between the raw feed and what the store displays.

Category mapping. The supplier uses their own category hierarchy, which doesn’t match what we want to show customers. We maintain a mapping table that translates supplier category paths to store categories. When a new supplier category appears in the feed, it’s flagged for review rather than silently creating a miscategorized product.

Pricing rules. Mr. Banny doesn’t sell at supplier cost. Markup rules vary by category, by product type, and occasionally by specific products that need manual pricing. The sync engine applies these rules during the apply phase, transforming wholesale cost to retail price according to a configurable ruleset. When the ruleset changes, we can trigger a re-price across the relevant products without waiting for the supplier data to change.

Product variations. Many products come in multiple configurations — different sizes, different materials, different colours. The supplier represents these as separate line items with a parent SKU relationship. We map these to WooCommerce variable products with attributes and variations. Getting this mapping right is one of the more tedious parts of the initial setup, but it’s stable once established.

Image handling. We don’t re-import product images on every sync unless the image URL in the feed has changed. Images are the heaviest part of any product import. Downloading and processing a product image that hasn’t changed is pure waste. We hash the image URLs the same way we hash the product data.

Keeping the Store Fast During Sync

A naive implementation of everything above would still degrade store performance during sync windows because of WooCommerce’s own cache invalidation behaviour.

Every time you update a product in WooCommerce — even a single field — it clears the cache entries for that product, its categories, the shop archive, and related queries. Update 500 products in quick succession and you’re triggering cache invalidation for a significant fraction of your site simultaneously.

We handle this in two ways:

Suppressing redundant cache clears. During a sync operation, we disable WooCommerce’s automatic cache invalidation hooks and instead queue a single targeted cache purge once the batch is complete. The cache clears once for a batch of 500, not 500 times individually.

Batching the queue. We schedule the heaviest jobs (new product creation, variation updates) during the lowest-traffic window of the hour — typically the first few minutes after the sync starts — and spread updates over the remainder. If real traffic spikes during a sync, the job processing rate backs off automatically based on server load.

The result is that a visitor landing on a category page during an active sync notices nothing different. The sync is happening, but it’s happening in the background, politely.

What We Monitor

A system this automated only stays reliable with proper observability. We track:

Sync completion rate. Did the last sync run complete? How many products were processed? How many failed? We alert if the failure rate exceeds a threshold or if a sync hasn’t completed within a reasonable window.

Data anomalies. If a sync run would affect more than 20% of the catalogue in a single pass — prices all changing simultaneously, thousands of products disappearing — we flag it for manual review before applying. The most likely explanation is a supplier data error, not a genuine mass update.

Price drift. We log a sample of pricing calculations on every run and alert if the applied prices are outside expected ranges. This catches both supplier data errors and ruleset misconfiguration before customers see wrong prices.

Queue depth. If jobs are piling up faster than they’re being processed, something is wrong — either the server is overloaded or a job is failing repeatedly. The queue depth metric surfaces this early.

What This Makes Possible

The business outcome is straightforward: Mr. Banny’s catalogue reflects reality within an hour of any change at the supplier. When a product goes out of stock, it shows as unavailable. When prices change, the store price updates automatically according to the margin rules. When new products arrive in the warehouse, they appear in the store.

This removes an entire category of operational work — manual price updates, stock checks, product maintenance — and replaces it with a system that runs 24 times a day without anyone touching it.

The engineering investment was significant. But it was a one-time investment in infrastructure that has run reliably since 2022, processing millions of product updates without manual intervention.

The Broader Lesson

Every serious e-commerce integration eventually bumps into the same problems: scale, reliability, and the gap between what a plugin can do and what your business actually needs. Off-the-shelf solutions are designed for the common case. When your case isn’t common, you need something built for your specific requirements.

The patterns in this architecture — delta synchronization, background job queuing, staged application, observability — apply beyond WooCommerce. They’re the same patterns used in any data pipeline that needs to be both fast and reliable.

If you’re dealing with catalogue integrations, warehouse synchronization, or any external data feed that your store depends on, the question isn’t whether to build something custom. It’s whether the business justifies the engineering investment. For Mr. Banny, it obviously did. For most stores doing meaningful volume with a supplier dependency, it usually does.

Let's build something together

Tell us about your project and we'll figure out how we can help.