Inventory

AI Demand Forecasting for Apparel: What Works and What Overpromises

AI Demand Forecasting for Apparel: What Works and What Overpromises
By Shubham Singh · Reviewed by Venkat Koripalli · · 10 min read

A planner at a $15M contemporary brand opens her Monday with three browser tabs. One is a forecasting tool the brand bought last year, projecting unit sales by SKU for the next 13 weeks. The second is the Shopify admin showing last week’s actuals, which are off by 40 percent on the top five styles. The third is a spreadsheet from the wholesale team showing 1,200 units of a hero style already committed to two majors for an August ship window. The forecast does not know about the wholesale commit. The actuals do not match because returns from the last drop have not been posted yet. She closes the forecast tab and goes back to the spreadsheet. This is the gap between AI demand forecasting for apparel as a category pitch and AI demand forecasting as something a planner actually trusts on a Monday morning.

What is AI demand forecasting for apparel, precisely?

AI demand forecasting for apparel is the use of machine learning models, typically gradient-boosted trees or sequence models, to predict unit sales by SKU, size, color, channel, and week, using historical sales, returns, price, promotion, weather, and attribute data as inputs. The output is a probabilistic forecast (often a median plus a confidence band) that feeds buying, allocation, replenishment, and production decisions.

That definition matters because most vendor pitches collapse three different problems into one word. Forecasting a replenishment basic with three years of history is not the same problem as forecasting a fashion drop with no history. Forecasting total demand for a style is not the same problem as forecasting the size curve within that style. Forecasting DTC demand is not the same problem as forecasting wholesale demand, because wholesale demand is not a forecast at all in the early weeks of a season, it is a set of POs.

If a tool treats all three as the same problem, the planner ends up doing what the planner in the opening scene did. She closes the tab.

Why do most AI forecasting tools overpromise on apparel?

From the fit calls I run with prospects each week, the pattern is consistent. The brand bought a forecasting tool because oversell at peak was running 2 to 3 percent and the buying team was making decisions in a spreadsheet. Six months in, the tool produces a forecast, but nobody on the planning team uses it for the decisions that actually move money. The reasons are almost always the same three.

First, the product data going into the model is fragmented. Style attributes (fabric, silhouette, sleeve length, fit, occasion) live partly in the PLM, partly in Shopify, partly in a Google Sheet the design team maintains. The model cannot learn that a relaxed linen camp shirt sells like other relaxed linen camp shirts if the attribute tagging is inconsistent across systems. This is Breakpoint 1 of the 6 Breakpoints framework, product data fragmentation, and it eats forecast accuracy before the model has a chance.

Second, the inventory truth the model uses for the actuals is wrong. At a $15M brand running wholesale plus DTC plus 3PL, planners are spending 6 to 9 hours a week reconciling inventory across Shopify, the 3PL, and wholesale. Returns post to inventory on a weekly batch instead of within days. Wholesale-committed units show as available. The forecast is being trained and evaluated against a sales signal that is itself unreliable. This is Breakpoint 3, and no model fixes it.

Third, the tool forecasts demand for SKUs the brand will not sell again. Roughly half the assortment at most contemporary brands turns over each season. A forecast for a style with no comparable history is not really a forecast, it is an attribute-based comp pull dressed up in a confidence interval. That can be useful, but only if the user understands what they are looking at.

What actually works in apparel demand forecasting?

There are three forecasting problems where machine learning is genuinely better than a planner with a spreadsheet, and brands should invest there first.

The first is replenishment forecasting for core and carryover styles with two or more seasons of clean history. A core white tee, a signature dress silhouette that returns every spring, a denim wash that has been in the line for three years. These styles have enough data for a model to pick up seasonality, price elasticity, and the lift from email sends or paid campaigns. A weekly forecast with a 4 to 13 week horizon, refreshed against actuals, will outperform a planner’s spreadsheet, especially at the SKU-size level where humans collapse to averages.

The second is size curve prediction within a known silhouette. Even on a brand new style, if the model knows the silhouette (a fit-and-flare dress in stretch ponte), it can predict the size curve from comparable styles with reasonable accuracy. Getting the size curve right matters more than getting total units right, because broken size runs are what actually cause markdowns. A style that sold out of M and L while sitting on XS and XXL is a margin event, not a demand event.

The third is markdown timing and depth on aging inventory. Models that price-test against sell-through and weeks of cover can recommend the right markdown cadence better than a calendar-based rule. This is mostly a DTC problem, not a wholesale one.

These three problems share a property. They all require clean attribute data on the product, accurate sales and return data on the actuals, and a clear separation between channels. None of them require AI for the front-page pitch. They require AI for the unglamorous middle.

What overpromises and why?

The objections I hear most often in evaluations are about the parts that overpromise. Three patterns show up.

The first is new drop forecasting for fashion-forward styles. Vendors will demo a model that takes images, attributes, and trend signals and projects week-one sell-through for a brand new SKU. The accuracy on these is meaningfully worse than the demo suggests, because the variance on a new fashion drop is dominated by factors the model cannot see, including the email send the brand decides to schedule three days before launch, the influencer post that may or may not happen, and the weather in the brand’s top three metro areas during launch week. A merchant’s gut, anchored to a clear comp set, is competitive with the model here. The model is useful as a sanity check, not as the decision.

The second is forecasting wholesale demand as if it were a probabilistic signal. Wholesale demand in the early weeks of a season is not a forecast. It is a set of POs from named accounts with specific ship windows. The planning problem is allocation against a committed pool, not prediction. A forecasting tool that shows DTC and wholesale on the same chart, both as forecast lines, is treating different objects as the same object. Magnolia Pearl ships same-day on DTC orders during a drop while also managing wholesale ship windows and international duty implications, and those two demand streams are run with different logic, not the same model.

The third is automated buying. A model that recommends buy quantities for next season is doing a useful thing, but the recommendation has to land in a buying workflow where a planner can override at the style, color, and size level, with the reasoning captured. Tools that pitch hands-off automated buying for apparel are pitching to a problem that does not exist at the $5M to $100M band. The buying team is the merchandising team. They are not going to delegate.

How does channel-aware ATS change the forecasting problem?

This is the part of the conversation most forecasting vendors skip, and it is the part that matters most for a wholesale plus DTC brand. Available-to-sell at a SKU-size level is not one number. It is a number per channel, because the same unit cannot be promised to both a Shopify shopper and a wholesale account.

If the brand runs a single inventory pool and lets the first order in win, the wholesale commit is silently eaten by DTC, and chargebacks follow when the wholesale ship window arrives. If the brand carves out a wholesale-committed pool, the DTC forecast has to be built against the DTC-available pool, not total inventory. Most forecasting tools do not understand this distinction. They forecast demand against total stock and recommend buys that assume all units are interchangeable. They are not.

Lufema runs multiple wholesale entities and a B2B portal with a multi-brand catalog. The forecasting question for that kind of structure is not just how many units of style X will sell, it is how many units of style X by entity, by account tier, by ship window, with what carry-over into the next drop. A model that does not see those dimensions cannot answer the question. A planner working in a spreadsheet at least knows what the spreadsheet does not contain.

The POV here is simple. Wholesale should not run through DTC inventory logic, and wholesale demand should not run through DTC forecasting logic. They are different problems with different time horizons and different sources of truth.

What is the right sequence to get value from AI forecasting?

Brands at the $10M to $20M breakpoint zone often look at forecasting tools as the answer to a planning problem, and the sequence they walk into makes the tool fail. The order that works is the inverse.

  1. Fix product data fragmentation first. Style attributes live in one place, with consistent taxonomy, and flow to every downstream system. Without this, no model learns what a comparable style is.

  2. Fix inventory truth next. Returns post to inventory in days, not weeks. The 3PL feed reconciles daily. Wholesale-committed pools are explicitly carved out. Channel-aware ATS is calculated, not estimated.

  3. Run weekly OTB during selling season against a real forecast for replenishment styles and a comp-based projection for new drops. Monthly OTB is too slow when DTC sell-through can swing 30 percent in a week and a hero style can sell out in 72 hours.

  4. Layer size curve prediction onto the buying workflow, so the model assists on the dimension where humans are weakest.

  5. Only then evaluate vendor tools that pitch new drop forecasting or automated allocation, because by that point the brand has the data foundation to tell whether the tool is actually better than the planner.

Most brands try to start at step 5 and end up where the planner at the start of this post ended up, with three browser tabs and a spreadsheet.

What this means for an apparel operations team

AI demand forecasting is real, and on the right problems (replenishment, size curves, markdown timing) it earns its keep. The category does not need defending. What it needs is a sharper buyer.

The operations team’s job in an evaluation is to push every vendor on three questions. What does the model do on a SKU with no history, and how is that different from what it does on a SKU with three seasons of history. How does the tool handle channel-separated inventory and wholesale-committed pools. What product data does the model require, and what is the brand’s current state against that requirement. The answers will tell you whether the tool is going to ship value in six months or sit in a browser tab.

The broader pattern is the one the 6 Breakpoints framework keeps surfacing. The forecasting problem is downstream of the product data problem and the inventory truth problem. Fix those, and the forecasting tool you eventually buy will work. Skip them, and the tool you bought last year is still sitting in that third browser tab.

Inventory Truth Scorecard

How accurate is your inventory really?

Nine questions estimate where your operation sits on the inventory-truth curve and how much revenue is at risk. Takes about three minutes.

Frequently asked questions

Where this fits in the Uphance platform

S
Written by
Shubham Singh
Solutions Consultant, Apparel Operations, Uphance

Shubham writes about evaluating ERP fit, assessing operational complexity, and how apparel brands can tell whether their current systems are helping or holding them back. As a Solutions Consultant at Uphance, he runs discovery conversations and fit assessments for apparel brands moving off patchwork stacks of PLM, PIM, inventory, and B2B tools. His articles cover ERP selection, vendor RFPs, comparison frameworks, and the operational signals that tell a brand it has outgrown spreadsheets and point solutions. He focuses on how mid-market apparel teams evaluate connected platforms against the cost of staying with what they have.

V
Reviewed by
Venkat Koripalli
Founder & CEO, Uphance

Venkat is the Founder and CEO of Uphance and the author of the 6 Breakpoints of Apparel Operations framework. He writes about operational clarity for apparel brands as complexity grows across channels, warehouses, partners, and teams. His work focuses on why disconnected operations, not growth itself, create the chaos most mid-market brands feel between $5M and $100M in revenue, and on the operating-model patterns that decide whether scaling a brand strengthens execution or fractures it. He argues that the status quo is the real competitor in apparel software, and that the right move is fewer systems with deeper connection, not more dashboards.

More from the blog