A Factory Performance Scorecard for Apparel Brands, Season Over Season
It is week 11 of a 14-week production calendar. The merchandising lead is on a call with a factory in Tirupur, pushing on a shipment that was supposed to leave port last Friday. The production manager pulls up a shared spreadsheet with the season’s POs, hunts for the right tab, and reads off a ship date that does not match what the factory just said on the call. Nobody can find the original lab dip approval. Nobody can say whether this same factory was late last season, or the season before. The decision in the room is whether to keep the vendor for Resort. The decision is being made on vibes.
What does it actually mean to track factory performance apparel season over season?
To track factory performance apparel season over season means maintaining a per-vendor record of how each factory performed against the plan, captured at consistent checkpoints, and read as a trend across at least three seasons. It is not a one-time audit. It is not a relationship score from the sourcing lead. It is a structured scorecard with the same metrics applied to every vendor on every PO, accumulating in the system that already holds your production data.
The metrics that matter are narrow and operational. On-time delivery against the committed ex-factory date. First-pass quality at inbound inspection. Spec accuracy against the approved tech pack. PO-to-receipt unit variance. Lead time variance versus quoted lead time. Sample round count before approval. Communication responsiveness, measured as days to reply on open issues. Seven metrics, scored per PO, rolled up per factory per season.
This sits squarely inside BP2 of the 6 Breakpoints framework, where production and supply execution drift from the plan. The drift itself is normal. The failure is that the drift never gets captured as data, so the next season’s sourcing decisions repeat the same mistakes with the same vendors.
Why do most apparel brands fail to track factory performance at all?
When I sit in on a customer kickoff, the production team almost always claims they know which factories are reliable. They will name two or three good ones and one or two problem ones. Then we ask for the data behind that claim and we get a spreadsheet that tracks PO numbers and ship dates but does not capture delay reasons, quality rejects, or sample iteration counts. The institutional knowledge lives in one person’s head. That person is usually the production manager, and they are usually leaving in eighteen months.
There are three structural reasons brands do not track factory performance properly.
The first is that the production data lives in too many places. The PO is in one system, the tech pack is in PLM or a Dropbox folder, the inbound QC report is on paper or in a WhatsApp thread, and the ex-factory date lives in an email confirmation from the agent. There is no single record per PO that captures plan versus actual.
The second is that the metrics are not standardized across vendors. A factory in Vietnam gets graded on one set of expectations, a factory in Portugal on another, and a domestic CMT shop on a third. Without a common scoring rubric, you cannot compare them, and without comparison you cannot decide where to place the next season’s volume.
The third is that the data, even when it exists, never gets read at the right moment. Sourcing decisions for Fall happen in February. The Spring delivery problems happen in March. By the time anyone looks back, the next season’s POs are already issued.
What belongs on the scorecard, and what does not?
A factory scorecard should be short enough to fit on one page per vendor per season. If it sprawls, nobody reads it.
Here is what belongs on it.
- On-time delivery percentage. Of the POs that shipped this season, what percentage hit the committed ex-factory date within a two-day tolerance. Measure to the date the goods physically left the factory, not the date the factory said they shipped.
- First-pass quality rate. Of the units received, what percentage passed inbound QC on the first inspection. Track the rejection reasons separately, because a vendor with 4 percent rejects on stitching is a different problem than a vendor with 4 percent rejects on color matching.
- Spec accuracy. How many style-level deviations from the approved tech pack were found at receipt. Wrong trim, wrong wash, wrong label placement, wrong fiber content.
- PO-to-receipt unit variance. The factory committed to 1,200 units across the size run. You received 1,147. The variance is 4.4 percent under. Track over and under separately, because chronic shorts and chronic overs are different operational failures.
- Lead time variance. Quoted lead time versus actual lead time, in days. A factory that consistently runs five days long is plannable. A factory that runs anywhere from on-time to three weeks late is not.
- Sample iteration count. How many rounds of sampling before PP approval. This is a leading indicator for production problems, not a vanity metric.
- Cost variance. Quoted FOB versus invoiced FOB, including any post-PO surcharges for air freight, expedited trims, or rework.
Here is what does not belong on it. Subjective relationship scores. NPS-style satisfaction ratings from the sourcing team. Any metric that cannot be pulled from a PO, a receipt, or a QC record. The moment you let opinion onto the scorecard, the scorecard becomes a political document and stops being an operational one.
How do you score the seven metrics in a way that is comparable across vendors?
A five-point scale per metric works. Anything more granular invites argument. Anything less granular flattens the signal.
On-time delivery: 5 is 95 percent or better, 1 is below 70 percent. First-pass quality: 5 is 98 percent or better, 1 is below 90 percent. Spec accuracy: 5 is zero deviations, 1 is three or more per style. PO-to-receipt variance: 5 is within 1 percent, 1 is more than 5 percent. Lead time variance: 5 is within two days of quote, 1 is more than ten days late. Sample iterations: 5 is two rounds or fewer, 1 is five or more. Cost variance: 5 is zero overage, 1 is more than 5 percent over quote.
Weight the metrics. On-time delivery and first-pass quality should weight roughly double the others, because they are the metrics that actually break the season. Spec accuracy and PO-to-receipt variance weight at one and a half. The rest weight at one.
A factory that scores 4.5 weighted average is a factory you give more volume to. A factory that scores 3.5 is on watch. A factory that scores below 3.0 across two consecutive seasons does not get a third season. That is the entire point of running the scorecard. It converts the renewal conversation from a relationship debate into a data review.
When does the scorecard get read, and by whom?
The scorecard gets read at three points in the season, and the timing is what makes it useful.
Mid-season, around the time bulk production is in progress for the current season, the production team reviews in-flight POs against the scoring rubric and flags any factory that is trending below 3.5 on its weighted average. This is the moment to pull volume forward, expedite trims, or move the next color into a different vendor.
End-of-season, within two weeks of the final delivery, every PO closes out with its scoring. This is the data that goes into the season-over-season trend.
Pre-sourcing, eight to ten weeks before the next season’s POs go out, the merchandising and production leads sit down with the cumulative scorecard. Vendors get tiered. Tier one gets first call on the volume. Tier two gets a smaller allocation and a conversation about the specific metrics that need to improve. Tier three either gets a corrective action plan with measurable targets or gets cut.
The sourcing decision should never be made without the scorecard open on the table.
Why does running this in spreadsheets fail at the $10M to $20M breakpoint?
For a brand under $5M, a spreadsheet works. There are six factories, forty POs a season, and one person who touches every order. The data lives in their head and the spreadsheet is a backup.
At the $10M to $20M breakpoint, that model collapses. There are fifteen to twenty-five factories, two hundred POs a season, three people involved in production, and the spreadsheet has been forked three times. For a $15M brand running wholesale and DTC with a 3PL in the mix, the production team is already spending six to nine hours a week reconciling inventory data across Shopify, the 3PL, and wholesale orders. Adding a parallel scorecard process on top of that is exactly the kind of work that does not get done.
What stalled rollouts have in common, in my experience, is that the customer tries to bolt a new tracking process onto a stack that does not capture the underlying data. You cannot score on-time delivery if the system does not store the committed ex-factory date alongside the actual ship date. You cannot score first-pass quality if QC results never get keyed in. The scorecard has to live where the POs live, or it does not live at all.
This is the architectural argument. A production module that holds the PO, the milestones, the QC outcome, and the receipt variance in one record is the only place the scorecard can be calculated automatically. Everything else degrades into a parallel spreadsheet that goes stale by week three.
What is the right point of view on factory consolidation?
Here is the stand worth taking. Most apparel brands at the $10M to $50M range are working with too many factories, not too few. The instinct is to spread risk across vendors. The reality is that the bottom third of your vendor list is consuming a disproportionate share of production management time, generating a disproportionate share of quality incidents, and eroding margin through cost overruns and air freight.
If the bottom three factories on your scorecard account for less than 15 percent of your unit volume but more than 40 percent of your delay incidents, the right move is to consolidate that volume into your top tier. The scorecard is the document that gives you permission to do it.
The counter-argument is that consolidation creates concentration risk. That is true and it should be managed by tiering, not by keeping bad vendors on the roster. A healthy vendor base for a brand in this range looks like four to six tier-one factories taking 70 percent of volume, three to five tier-two factories taking 25 percent, and a small tier-three group used for capsules and tests.
What this means for an apparel operations team
The scorecard is not a reporting artifact. It is the instrument that turns sourcing from a relationship conversation into an operational one. If the production team cannot show, in numbers, why one vendor is on the list for next season and another is off, the decision is being made on memory and politics.
The work to set this up is real but bounded. Define the seven metrics. Agree on the five-point scale and the weights. Get the data captured at the PO level for one full season. By the second season the scorecard becomes self-sustaining, and by the third season the trend lines start driving real decisions.
This is BP2 work. Production drift is inevitable. The brands that handle it well are the ones that turn the drift into data and read it before the next season’s POs go out, not after.
Where is your operation on the 6 Breakpoints curve?
The assessment scores your apparel operation across all six breakpoints (product data, production, inventory truth, order flow, warehouse execution, reporting) and identifies which one is hurting you most.
Frequently asked questions
Where this fits in the Uphance platform
Ronnell writes about onboarding, adoption, and operational readiness for apparel brands moving to a connected platform. His articles focus on what it takes to go live with confidence and sustain strong execution across channels, warehouses, and teams. As Head of Customer Success and Onboarding at Uphance, he leads the implementation phases that turn a software signature into running operations. He writes about kickoff scoping, data migration, sandbox cutover, change management patterns, and the stakeholder alignment work that determines whether a connected platform actually changes how a brand runs, or just adds another login to the existing chaos.
Ruchit writes about product strategy for apparel operations, covering how mid-market fashion brands use connected workflows to manage product development, inventory, orders, warehouse execution, and reporting. As Head of Product at Uphance, he shapes the roadmap that ties PLM, PIM, BOM management, allocation, fulfillment, and warehouse operations into one system. His articles dig into apparel-specific operational mechanics: tech packs, spec sheets, putaway, pick-pack, landed cost, and the data plumbing that makes inventory truth possible across multiple channels and locations. He focuses on the workflow-level questions that separate generic ERPs from systems built for how apparel brands actually run.
