← All posts Insights 5 min read

AI vs Human-Written Product Descriptions for NetSuite-Synced Catalogs: 8-Week A/B Results

We A/B-tested AI vs human-written product descriptions across three NetSuite-synced catalogs over 8 weeks. Traffic, conversion, and time-to-publish data.

We ran an experiment across three client catalogs this spring: half the SKUs got AI-generated product descriptions, half stayed human-written. We measured organic traffic, conversion rate, and time-to-publish over eight weeks. The results were less binary than the AI hype implies — and more useful than the “AI is slop” backlash suggests.

Who this is for: commerce operators sitting on a NetSuite catalog of 1,000+ SKUs where most product descriptions are either thin or copied from the manufacturer’s data sheet. Not for stores selling 50 hand-curated items.

The setup

Three clients agreed to the experiment. One sells industrial fittings (4,200 SKUs), one sells specialty food ingredients (1,800 SKUs), one sells refurbished electronics (900 SKUs). Existing descriptions ranged from “copy-paste of the supplier PDF” to “two-sentence stub written in 2019.”

We split each catalog in half by SKU ID parity (odd vs even — a random-but-stable split). Half got new descriptions written by a copywriter at $1.10 per SKU. Half got new descriptions generated by an LLM with the same input data (manufacturer spec, NetSuite item attributes, parent category) at ~$0.04 per SKU. Both halves were re-indexed by Google over the following two weeks.

What we measured

  • Organic landing-page sessions over 8 weeks post-publish, vs the 8 weeks before.
  • Add-to-cart rate on the product page (we couldn’t measure conversion cleanly because checkout abandonment swamps small per-product effects).
  • Time-to-publish from “SKU created in NetSuite” to “description live on storefront.”

The result that surprised us

On organic traffic, the AI descriptions performed within 5% of the human-written ones — statistically indistinguishable across the three catalogs. We expected the AI side to underperform because of the assumption that Google penalises AI content; we did not see that signal at the SKU level. What we did see is that both versions massively outperformed the original supplier-PDF descriptions, which is the comparison that actually matters for most stores.

The result that didn’t surprise us

On add-to-cart rate, human-written descriptions won by an average of 11% on the industrial fittings catalog, 4% on the food ingredients catalog, and were a wash on the refurbished electronics. The pattern: the more the buying decision depends on trust and judgement (technical specs, food safety), the more a human voice helps. The more the decision is utilitarian (the customer already knows they want a refurbished Dell), the less it matters.

The result that mattered most

Time-to-publish. The human pipeline averaged 6 days from “SKU exists in NetSuite” to “description live.” The AI pipeline averaged 14 minutes — and that’s including the human-in-the-loop review step. For a catalog adding 100 new SKUs a month, that’s the difference between “we have product pages by next Wednesday” and “we’ll get to it eventually.”

What we actually shipped

The hybrid pipeline. For every new SKU in NetSuite, an LLM drafts a description from the structured item record. The draft lands in a review queue. A human reviewer can approve in one click (which publishes verbatim), light-edit (which logs the edit so we can improve the prompt over time), or full-rewrite (used for the top 200 SKUs by revenue, where the human voice actually moves numbers).

// The prompt structure that performed best for us
const prompt = `
You are writing a product description for an online store.

PRODUCT DATA (from NetSuite):
${JSON.stringify(item, null, 2)}

WRITE: a description of 80–140 words, in plain English, no marketing
adjectives, no "elevate your" phrasing. Lead with the single most
important fact a buyer needs (compatibility, size, certification).
End with one sentence about who this is for.

DO NOT: invent specs not present in the data above. If unsure, omit.
`;

The “do not invent specs” instruction matters. The single biggest review-queue rejection was hallucinated certifications (“FDA-approved” on a food ingredient that wasn’t). That’s a legal problem, not just a quality one.

The NetSuite plumbing

We hook the pipeline off the NetSuite “Item Created” event. A SuiteScript user-event posts the item payload to a small queue; a worker calls the LLM API and writes the draft back to a custom field on the item (custitem_ai_draft_description). The storefront sync, in turn, only picks up descriptions from custitem_approved_description — which is set by the reviewer, not the AI. This keeps the storefront source-of-truth clean and makes the review step un-skippable.

FAQ

Doesn’t Google penalise AI-generated content?

Google’s stated position is that they reward useful content regardless of origin and penalise unhelpful content regardless of origin. Our SKU-level data is consistent with that. A thin, unhelpful, AI-generated description will rank worse than a thick, useful one — but that’s also true of human-written content. The originality of the structured data behind the description matters more than who typed the prose.

What model did you use?

We tested across three frontier-class LLMs from different providers. The output quality difference was smaller than the difference between a good and a bad prompt. We won’t name the winners because the rankings shifted twice during the experiment — model providers ship updates faster than blog posts age.

How do you keep the AI-generated descriptions consistent with brand voice?

One paragraph of brand guidelines in the system prompt, plus three or four example product descriptions from the existing site. Few-shot examples beat prose instructions for voice consistency. We refresh the examples every quarter so the voice doesn’t drift.

What’s a reasonable per-SKU cost in production?

For a single descriptive paragraph: $0.02–$0.06 per SKU at frontier-class model pricing as of mid-2026, including a structured-output retry pass. For a 1,000-SKU monthly cadence, that’s a $20–$60 line item — well under the cost of one freelance copywriter day.

The honest recommendation

If your catalog has more than 500 SKUs and you’ve ever skipped writing a description because you ran out of time, the hybrid pipeline pays for itself in week two. If you sell 50 products that customers pick up because of how you talk about them, keep writing them yourself. Most stores live in the first category and act as if they’re in the second.

We build this pipeline as part of our WooCommerce + NetSuite and Shopify + NetSuite implementation packages. The integration code is the same; only the storefront write target changes.


Ship it

Need this in your stack?

We build, integrate, and ship — no calls, just delivery.

Start a project →