Introduction

SkuSync is a specialized data transformation tool designed to convert standard e-commerce product exports into formats optimized for Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines.

rocket_launchWhy use SkuSync?

Raw HTML and CSV data are noisy for AI models. SkuSync strips away presentation layers and structures your product catalog into semantic data that AI agents can "read" efficiently, reducing token usage and improving context understanding.

1Getting Started

Follow these steps to generate your first AI-ready dataset. No coding knowledge is required.

download

1. Export Data

Go to your SHOPLINE admin panel, navigate to Products, and export your catalog as "All products".

upload_file

2. Upload & Convert

Drag your CSV into SkuSync. The browser-based engine parses it instantly without server uploads.

2AI Checker: Competitor Analysis Tool

Analyze competitor websites for AI-ready data formats. Discover how easily Large Language Models (LLMs) can parse their content and identify opportunities to outperform them.

searchWhat AI Checker Does

AI Checker scans any website for AI-ready data formats that help LLMs understand and index product content. It's like an SEO audit, but for the AI era.

Five AI-Ready Data Formats Checked

description

llms.txt

Standardized AI documentation file that tells LLMs about your site structure and content

schema

Schema.org Markup

Structured data that helps search engines and AI understand product information

data_object

NDJSON Feed

Machine-readable data stream for efficient bulk processing by AI systems

smart_toy

robots.txt

Search engine crawler instructions that AI agents also respect

map

sitemap.xml

Site structure map that helps crawlers discover and index all your pages

analyticsUnderstanding AI Readiness Score

AI Checker calculates a 0-100 score based on three key dimensions:

visibility

Discovery

How easily AI agents can find and access your data (llms.txt, sitemap.xml, robots.txt)

account_tree

Structure

How well your content is organized with semantic markup (Schema.org, structured data)

smart_toy

Machine Readability

How easily machines can parse and understand your data (NDJSON feeds, clean formats)

calculateScoring Algorithm & Grade System

The AI Readiness Score (0-100) is calculated based on the presence and quality of AI-ready data formats. Each component contributes to the total score according to its importance for AI discoverability.

Score Weights (Maximum 100 points)

Component	Max Score	Scoring Rules
descriptionllms.txt	25	Basic existence: 15 pts<br>Complete version: +10 pts
schemaSchema.org	35	High quality (≥6 fields): 35 pts<br>Medium (≥3 fields): 20 pts<br>Low (<3 fields): 10 pts
data_objectNDJSON/API	25	NDJSON exists: 25 pts<br>Only JSON API: 15 pts
mapDiscoverability	15	sitemap.xml: 8 pts<br>robots.txt: 7 pts
Total Maximum		100

lightbulb

Why Schema.org Has the Highest Weight (35 points)

Schema.org structured data is the international standard that AI models (ChatGPT, Claude, Gemini) use to understand web content. Without it, AI cannot accurately parse product information, severely impacting your visibility in AI-powered search results.

Grade System

A+

90-100 points

Excellent - AI-ready leader

80-89 points

Great - Very competitive

B+

70-79 points

Very Good - Above average

60-69 points

Good - Room for improvement

50-59 points

Fair - Needs significant work

D/F

0-49 points

Poor - Not AI-ready

rocket_launchHow to Use AI Checker

Navigate to theAI Checkerpage
Enter a competitor's website URL (e.g., https://example.com)
Click "Analyze" to start the scan
Review the technical scan results for each data format
Check the AI Readiness Score to see their overall AI-friendliness
Use SkuSync to generate better AI-Ready data and outperform competitors

Ready to Analyze Your Competitors?

Use AI Checker to discover AI readiness gaps and gain a competitive edge.

searchTry AI Checker Now

AI Checker SEO/GEO/AEO Templates (Audit Reference)

AI Checker does not generate files automatically. Use these templates as implementation references when a rule fails or warns.

/llms.txt

# Brand / Site Name

## Docs
- [About](https://example.com/about): Company overview
- [Policies](https://example.com/policies): Policy index

## Products
- [Catalog](https://example.com/products): Product index

/llms-full.md

# Brand Long-form Context

## Policies
...

## Product Knowledge
...

## Support FAQ
...

/products.ndjson

{"item_id":"SKU-1","title":"Demo Product","description":"Short product summary","url":"https://example.com/products/demo","brand":"Brand","seller_name":"Store","seller_url":"https://example.com","is_eligible_search":true}
{"item_id":"SKU-2","title":"Demo Product 2","description":"Short product summary","url":"https://example.com/products/demo-2","brand":"Brand","seller_name":"Store","seller_url":"https://example.com","is_eligible_search":true}

/feed.xml (RSS 2.0)

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Brand Updates</title>
    <link>https://example.com</link>
    <description>Latest updates</description>
  </channel>
</rss>

/robots.txt

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

User-agent: Google-Extended
Allow: /

/sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/products/demo</loc>
    <lastmod>2026-03-01T09:00:00Z</lastmod>
  </url>
</urlset>

Meta Robots / X-Robots-Tag

<meta name="robots" content="index,follow,max-snippet:160,max-image-preview:large" />
X-Robots-Tag: index, follow, max-snippet:160

Head Tags (Canonical / Hreflang / OG / Twitter)

<link rel="canonical" href="https://example.com/products/demo" />
<link rel="alternate" hreflang="en-US" href="https://example.com/en/products/demo" />
<link rel="alternate" hreflang="fr-FR" href="https://example.com/fr/products/demo" />
<meta property="og:title" content="Demo Product" />
<meta property="og:description" content="Demo summary" />
<meta property="og:image" content="https://example.com/og.jpg" />
<meta name="twitter:card" content="summary_large_image" />

JSON-LD (Organization / Product / Article)

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://example.com/#organization",
  "name": "Brand",
  "sameAs": ["https://x.com/brand", "https://www.linkedin.com/company/brand"]
}
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "How to choose a product",
  "datePublished": "2026-03-01"
}

/humans.txt

/* TEAM */
Team: Brand Team
Contact: team@example.com

/* SITE */
Language: en-US
Standards: HTML5, JSON-LD

/.well-known/security.txt

Contact: mailto:security@example.com
Expires: 2027-12-31T23:59:59Z
Preferred-Languages: en, zh

Shopify & SHOPLINE CSV Structure

SkuSync automatically detects and supports both Shopify and SHOPLINE export formats. Upload your CSV file - the tool will identify the platform and parse accordingly.

products_export.csvCSV

Handle,Title,Body (HTML),Vendor,Price,Image Src
classic-tee,"Classic Cotton Tee","<p>100% organic cotton</p>",BrandX,29.99,https://.../tee.jpg
slim-jeans,"Slim Fit Denim","<p>Indigo wash</p>",BrandX,89.00,https://.../jeans.jpg

tips_and_updates

Pro Tip: Automatic Platform Detection

SkuSync automatically detects whether your CSV is from Shopify or SHOPLINE by analyzing the header structure. No manual configuration needed - just upload and go!

3Format Specifications

data_objectJSON Schema

The JSON Schema output provides a strict type definition for your product data. This is essential when using "Function Calling" or "Tools" with OpenAI's GPT-4 or Anthropic's Claude, ensuring the model generates valid parameters.

{ "type": "object", "properties": { "title": { "type": "string" }, "price": { "type": "number" } ... } }

descriptionllms.txt

Following the proposed/llms.txtstandard, this format uses simplified Markdown to present content. It strips HTML tags, script blocks, and CSS classes, leaving only the semantic content relevant for training or context windows.

# Product Catalog Context
## Classic Cotton Tee
ID: 10234
Price: $29.99
Description: 100% organic cotton, pre-shrunk, available in earth tones.
## Slim Fit Denim
ID: 10235
...

codeNDJSON (Newline Delimited JSON)

NDJSON is the preferred format for bulk data ingestion into vector databases like Pinecone or Weaviate. Each line is a standalone valid JSON object, allowing for stream processing without loading the entire dataset into memory.

4Usage Guide

Learn how to integrate SkuSync outputs into your AI-powered workflows and applications.

smart_toyOpenAI Integration

Use the generated JSON Schema with OpenAI's Function Calling for structured product queries.

// Using JSON Schema with OpenAI Function Calling
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Find red shoes under $50" }],
  functions: [{
    name: "search_products",
    parameters: jsonSchema // Use generated JSON Schema
  }]
});

storageVector Database Import

Import NDJSON output into Pinecone or other vector databases for semantic search.

// Importing NDJSON to Pinecone
const fs = require('fs');
const ndjsonLines = fs.readFileSync('products.ndjson', 'utf-8').split('\n');

for (const line of ndjsonLines) {
  if (line.trim()) {
    const product = JSON.parse(line);
    await pineconeIndex.upsert({
      vectors: [{
        id: product.id,
        metadata: product,
        values: await embed(product.triples.join(' '))
      }]
    });
  }
}

hubRAG Pipeline Setup

Build a Retrieval-Augmented Generation pipeline using llms.txt as context.

// RAG Pipeline with llms.txt context
async function queryProductCatalog(userQuery) {
  // 1. Retrieve relevant context from llms.txt
  const context = retrieveContext(userQuery, llmsTxtContent);

  // 2. Augment query with context
  const prompt = `Context:
${context}

Question: ${userQuery}`;

  // 3. Generate response
  return await llm.generate(prompt);
}

folderSHOPLINE 2.0: 根目录放置 txt / ndjson 文件

以下为 Shopline Community 教程文案与截图整理，适用于将ads.txt、llms.txt、products.ndjson等文件映射到站点根路径。

来源：如何在根目录放置 txt 文件（Shopline Community）

1. Admin 后台上传 txt 文件

路径：设置 -> 文件库 -> 上传文件 -> 添加文件

2. 复制文件链接

3. 创建重定向指向文件链接

说明：

对于多域名场景，重定向会将所有域名的根路径统一指向指定地址。
重定向位置支持填写完整链接，不局限于相对路径。

3.1. 域名 -> 重定向 -> 管理重定向

3.2. 添加重定向，并设置根路径（如 ads.txt）重定向至步骤「2」中复制的链接

5Best Practices

checklistData Quality Checklist

check_circleEnsure CSV is UTF-8 encoded for proper character handling
check_circleVerify required columns (Handle, Title*, Vendor, Tags, Collections)
check_circleCheck for duplicate SKUs to avoid data conflicts
check_circleValidate image URLs are accessible and properly formatted
check_circleRemove any unnecessary HTML from product descriptions

datasetLarge Dataset Strategy

tips_and_updatesSplit files larger than 10,000 products for better performance
tips_and_updatesUse batch mode for multiple files to merge results efficiently
tips_and_updatesMonitor browser memory usage when processing large datasets
tips_and_updatesConsider using NDJSON for streaming large datasets to databases

integration_instructionsIntegration Best Practices

tips_and_updatesStore NDJSON in version control for data tracking and rollback
tips_and_updatesCreate automated CI/CD pipelines for regular data updates
tips_and_updatesSet up scheduled syncs to keep AI systems updated with latest products
tips_and_updatesDeploy llms.txt to your website root for AI crawler discovery

Troubleshooting

What's the maximum file size supported?expand_more

SkuSync supports CSV files up to 50MB per file. For larger datasets, use the batch mode to split your data into multiple files and process them together. The results will be automatically merged.

How are special characters in product titles handled?expand_more

The parser correctly handles quoted fields containing commas, newlines, and special characters. Ensure your CSV uses proper quoting (double quotes) around fields that contain special characters.

How can I improve conversion speed for large files?expand_more

For optimal performance with large files:
• Use a modern browser (Chrome or Edge recommended)
• Close unnecessary browser tabs to free up memory
• Ensure your device has sufficient RAM (4GB+ recommended)
• Use batch mode for multiple smaller files instead of one large file

Is my data sent to any server?expand_more

No.All processing happens locally in your browser using JavaScript. Your data never leaves your device. You can verify this by disconnecting from the internet after loading the page—the tool will continue to work offline.

My file is not parsing correctlyexpand_more

Ensure you are using the default CSV encoding (UTF-8). Some Excel exports use UTF-16LE which may cause issues. Try opening your CSV in a text editor and saving it specifically as UTF-8.

Images are missing in the outputexpand_more

SkuSync looks for theImage Srccolumn. If your export uses a different header (e.g., from a custom app), rename the column header toImage Srcbefore uploading.

Can I customize the triples extraction?expand_more

Yes! Use the "Customize Output Configuration" panel to select which fields should be extracted as semantic triples. You can include or exclude Vendor, Tags, Collections, Title, and Subtitle fields based on your needs.

What browsers are supported?expand_more

SkuSync works on all modern browsers including Chrome, Firefox, Safari, and Edge. For the best performance, we recommend using the latest version of Chrome or Edge with JavaScript enabled.

Does the JSON Schema comply with Google Shopping standards?expand_more

Yes.The JSON Schema output from SkuSync isfully compliant with the Google Merchant Center Product Feed Specification, which is the official standard for Google Shopping ads and free listings.

The generated schema includes allrequired fieldsper Google's specification:

id– Unique product identifier
title– Product title
description– Product description
link– Product landing page URL
image_link– Main product image URL
availability– Stock status
price– Price with currency code
condition– Item condition
brand– Brand name

The schema also includesrecommended fieldssuch asgtin(GTIN/EAN/UPC),mpn(Manufacturer Part Number),sku, and variant data.

This ensures your product feed can be uploaded to Google Merchant Center for Shopping ads and free listings. See theGoogle Merchant Center Product Feed Specificationfor details.