SkuSync logo

SkuSync

Introduction

SkuSync is a specialized data transformation tool designed to convert standard e-commerce product exports into formats optimized for Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines.

rocket_launchWhy use SkuSync?

Raw HTML and CSV data are noisy for AI models. SkuSync strips away presentation layers and structures your product catalog into semantic data that AI agents can "read" efficiently, reducing token usage and improving context understanding.


1Getting Started

Follow these steps to generate your first AI-ready dataset. No coding knowledge is required.

download

1. Export Data

Go to your SHOPLINE admin panel, navigate to Products, and export your catalog as "All products".

upload_file

2. Upload & Convert

Drag your CSV into SkuSync. The browser-based engine parses it instantly without server uploads.


2AI Checker: Competitor Analysis Tool

Analyze competitor websites for AI-ready data formats. Discover how easily Large Language Models (LLMs) can parse their content and identify opportunities to outperform them.

searchWhat AI Checker Does

AI Checker scans any website for AI-ready data formats that help LLMs understand and index product content. It's like an SEO audit, but for the AI era.

Five AI-Ready Data Formats Checked

description

llms.txt

Standardized AI documentation file that tells LLMs about your site structure and content

schema

Schema.org Markup

Structured data that helps search engines and AI understand product information

data_object

NDJSON Feed

Machine-readable data stream for efficient bulk processing by AI systems

smart_toy

robots.txt

Search engine crawler instructions that AI agents also respect

map

sitemap.xml

Site structure map that helps crawlers discover and index all your pages

analyticsUnderstanding AI Readiness Score

AI Checker calculates a 0-100 score based on three key dimensions:

visibility

Discovery

How easily AI agents can find and access your data (llms.txt, sitemap.xml, robots.txt)

account_tree

Structure

How well your content is organized with semantic markup (Schema.org, structured data)

smart_toy

Machine Readability

How easily machines can parse and understand your data (NDJSON feeds, clean formats)

calculateScoring Algorithm & Grade System

The AI Readiness Score (0-100) is calculated based on the presence and quality of AI-ready data formats. Each component contributes to the total score according to its importance for AI discoverability.

Score Weights (Maximum 100 points)

ComponentMax ScoreScoring Rules
descriptionllms.txt25Basic existence: 15 pts<br>Complete version: +10 pts
schemaSchema.org35High quality (&ge;6 fields): 35 pts<br>Medium (&ge;3 fields): 20 pts<br>Low (&lt;3 fields): 10 pts
data_objectNDJSON/API25NDJSON exists: 25 pts<br>Only JSON API: 15 pts
mapDiscoverability15sitemap.xml: 8 pts<br>robots.txt: 7 pts
Total Maximum100
lightbulb
Why Schema.org Has the Highest Weight (35 points)

Schema.org structured data is the international standard that AI models (ChatGPT, Claude, Gemini) use to understand web content. Without it, AI cannot accurately parse product information, severely impacting your visibility in AI-powered search results.

Grade System

A+
90-100 points
Excellent - AI-ready leader
A
80-89 points
Great - Very competitive
B+
70-79 points
Very Good - Above average
B
60-69 points
Good - Room for improvement
C
50-59 points
Fair - Needs significant work
D/F
0-49 points
Poor - Not AI-ready

rocket_launchHow to Use AI Checker

  1. Navigate to theAI Checkerpage
  2. Enter a competitor's website URL (e.g., https://example.com)
  3. Click "Analyze" to start the scan
  4. Review the technical scan results for each data format
  5. Check the AI Readiness Score to see their overall AI-friendliness
  6. Use SkuSync to generate better AI-Ready data and outperform competitors

Ready to Analyze Your Competitors?

Use AI Checker to discover AI readiness gaps and gain a competitive edge.

searchTry AI Checker Now

AI Checker SEO/GEO/AEO Templates (Audit Reference)

AI Checker does not generate files automatically. Use these templates as implementation references when a rule fails or warns.

/llms.txt

# Brand / Site Name

## Docs
- [About](https://example.com/about): Company overview
- [Policies](https://example.com/policies): Policy index

## Products
- [Catalog](https://example.com/products): Product index

/llms-full.md

# Brand Long-form Context

## Policies
...

## Product Knowledge
...

## Support FAQ
...

/products.ndjson

{"item_id":"SKU-1","title":"Demo Product","description":"Short product summary","url":"https://example.com/products/demo","brand":"Brand","seller_name":"Store","seller_url":"https://example.com","is_eligible_search":true}
{"item_id":"SKU-2","title":"Demo Product 2","description":"Short product summary","url":"https://example.com/products/demo-2","brand":"Brand","seller_name":"Store","seller_url":"https://example.com","is_eligible_search":true}

/feed.xml (RSS 2.0)

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Brand Updates</title>
    <link>https://example.com</link>
    <description>Latest updates</description>
  </channel>
</rss>

/robots.txt

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

User-agent: Google-Extended
Allow: /

/sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/products/demo</loc>
    <lastmod>2026-03-01T09:00:00Z</lastmod>
  </url>
</urlset>

Meta Robots / X-Robots-Tag

<meta name="robots" content="index,follow,max-snippet:160,max-image-preview:large" />
X-Robots-Tag: index, follow, max-snippet:160

Head Tags (Canonical / Hreflang / OG / Twitter)

<link rel="canonical" href="https://example.com/products/demo" />
<link rel="alternate" hreflang="en-US" href="https://example.com/en/products/demo" />
<link rel="alternate" hreflang="fr-FR" href="https://example.com/fr/products/demo" />
<meta property="og:title" content="Demo Product" />
<meta property="og:description" content="Demo summary" />
<meta property="og:image" content="https://example.com/og.jpg" />
<meta name="twitter:card" content="summary_large_image" />

JSON-LD (Organization / Product / Article)

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://example.com/#organization",
  "name": "Brand",
  "sameAs": ["https://x.com/brand", "https://www.linkedin.com/company/brand"]
}
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "How to choose a product",
  "datePublished": "2026-03-01"
}

/humans.txt

/* TEAM */
Team: Brand Team
Contact: team@example.com

/* SITE */
Language: en-US
Standards: HTML5, JSON-LD

/.well-known/security.txt

Contact: mailto:security@example.com
Expires: 2027-12-31T23:59:59Z
Preferred-Languages: en, zh

Shopify & SHOPLINE CSV Structure

SkuSync automatically detects and supports both Shopify and SHOPLINE export formats. Upload your CSV file - the tool will identify the platform and parse accordingly.

products_export.csvCSV
Handle,Title,Body (HTML),Vendor,Price,Image Src
classic-tee,"Classic Cotton Tee","<p>100% organic cotton</p>",BrandX,29.99,https://.../tee.jpg
slim-jeans,"Slim Fit Denim","<p>Indigo wash</p>",BrandX,89.00,https://.../jeans.jpg
tips_and_updates
Pro Tip: Automatic Platform Detection

SkuSync automatically detects whether your CSV is from Shopify or SHOPLINE by analyzing the header structure. No manual configuration needed - just upload and go!


3Format Specifications

data_objectJSON Schema

The JSON Schema output provides a strict type definition for your product data. This is essential when using "Function Calling" or "Tools" with OpenAI's GPT-4 or Anthropic's Claude, ensuring the model generates valid parameters.

{ "type": "object", "properties": { "title": { "type": "string" }, "price": { "type": "number" } ... } }

descriptionllms.txt

Following the proposed/llms.txtstandard, this format uses simplified Markdown to present content. It strips HTML tags, script blocks, and CSS classes, leaving only the semantic content relevant for training or context windows.

# Product Catalog Context
## Classic Cotton Tee
ID: 10234
Price: $29.99
Description: 100% organic cotton, pre-shrunk, available in earth tones.
## Slim Fit Denim
ID: 10235
...

codeNDJSON (Newline Delimited JSON)

NDJSON is the preferred format for bulk data ingestion into vector databases like Pinecone or Weaviate. Each line is a standalone valid JSON object, allowing for stream processing without loading the entire dataset into memory.

4Usage Guide

Learn how to integrate SkuSync outputs into your AI-powered workflows and applications.

smart_toyOpenAI Integration

Use the generated JSON Schema with OpenAI's Function Calling for structured product queries.

// Using JSON Schema with OpenAI Function Calling
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Find red shoes under $50" }],
  functions: [{
    name: "search_products",
    parameters: jsonSchema // Use generated JSON Schema
  }]
});

storageVector Database Import

Import NDJSON output into Pinecone or other vector databases for semantic search.

// Importing NDJSON to Pinecone
const fs = require('fs');
const ndjsonLines = fs.readFileSync('products.ndjson', 'utf-8').split('\n');

for (const line of ndjsonLines) {
  if (line.trim()) {
    const product = JSON.parse(line);
    await pineconeIndex.upsert({
      vectors: [{
        id: product.id,
        metadata: product,
        values: await embed(product.triples.join(' '))
      }]
    });
  }
}

hubRAG Pipeline Setup

Build a Retrieval-Augmented Generation pipeline using llms.txt as context.

// RAG Pipeline with llms.txt context
async function queryProductCatalog(userQuery) {
  // 1. Retrieve relevant context from llms.txt
  const context = retrieveContext(userQuery, llmsTxtContent);

  // 2. Augment query with context
  const prompt = `Context:
${context}

Question: ${userQuery}`;

  // 3. Generate response
  return await llm.generate(prompt);
}

folderSHOPLINE 2.0: 根目录放置 txt / ndjson 文件

以下为 Shopline Community 教程文案与截图整理,适用于将ads.txtllms.txtproducts.ndjson等文件映射到站点根路径。

1. Admin 后台上传 txt 文件

路径:设置 -> 文件库 -> 上传文件 -> 添加文件

Shopline 文件库上传文件界面

2. 复制文件链接

Shopline 文件链接复制界面

3. 创建重定向指向文件链接

说明:

  • 对于多域名场景,重定向会将所有域名的根路径统一指向指定地址。
  • 重定向位置支持填写完整链接,不局限于相对路径。

3.1. 域名 -> 重定向 -> 管理重定向

Shopline 管理重定向页面

3.2. 添加重定向,并设置根路径(如 ads.txt)重定向至步骤「2」中复制的链接

Shopline 添加重定向配置示例

5Best Practices

checklistData Quality Checklist

  • check_circleEnsure CSV is UTF-8 encoded for proper character handling
  • check_circleVerify required columns (Handle, Title*, Vendor, Tags, Collections)
  • check_circleCheck for duplicate SKUs to avoid data conflicts
  • check_circleValidate image URLs are accessible and properly formatted
  • check_circleRemove any unnecessary HTML from product descriptions

datasetLarge Dataset Strategy

  • tips_and_updatesSplit files larger than 10,000 products for better performance
  • tips_and_updatesUse batch mode for multiple files to merge results efficiently
  • tips_and_updatesMonitor browser memory usage when processing large datasets
  • tips_and_updatesConsider using NDJSON for streaming large datasets to databases

integration_instructionsIntegration Best Practices

  • tips_and_updatesStore NDJSON in version control for data tracking and rollback
  • tips_and_updatesCreate automated CI/CD pipelines for regular data updates
  • tips_and_updatesSet up scheduled syncs to keep AI systems updated with latest products
  • tips_and_updatesDeploy llms.txt to your website root for AI crawler discovery

Troubleshooting

What's the maximum file size supported?expand_more
SkuSync supports CSV files up to 50MB per file. For larger datasets, use the batch mode to split your data into multiple files and process them together. The results will be automatically merged.
How are special characters in product titles handled?expand_more
The parser correctly handles quoted fields containing commas, newlines, and special characters. Ensure your CSV uses proper quoting (double quotes) around fields that contain special characters.
How can I improve conversion speed for large files?expand_more
For optimal performance with large files:
• Use a modern browser (Chrome or Edge recommended)
• Close unnecessary browser tabs to free up memory
• Ensure your device has sufficient RAM (4GB+ recommended)
• Use batch mode for multiple smaller files instead of one large file
Is my data sent to any server?expand_more
No.All processing happens locally in your browser using JavaScript. Your data never leaves your device. You can verify this by disconnecting from the internet after loading the page—the tool will continue to work offline.
My file is not parsing correctlyexpand_more
Ensure you are using the default CSV encoding (UTF-8). Some Excel exports use UTF-16LE which may cause issues. Try opening your CSV in a text editor and saving it specifically as UTF-8.
Images are missing in the outputexpand_more
SkuSync looks for theImage Srccolumn. If your export uses a different header (e.g., from a custom app), rename the column header toImage Srcbefore uploading.
Can I customize the triples extraction?expand_more
Yes! Use the "Customize Output Configuration" panel to select which fields should be extracted as semantic triples. You can include or exclude Vendor, Tags, Collections, Title, and Subtitle fields based on your needs.
What browsers are supported?expand_more
SkuSync works on all modern browsers including Chrome, Firefox, Safari, and Edge. For the best performance, we recommend using the latest version of Chrome or Edge with JavaScript enabled.
Does the JSON Schema comply with Google Shopping standards?expand_more

Yes.The JSON Schema output from SkuSync isfully compliant with the Google Merchant Center Product Feed Specification, which is the official standard for Google Shopping ads and free listings.

The generated schema includes allrequired fieldsper Google's specification:

  • id– Unique product identifier
  • title– Product title
  • description– Product description
  • link– Product landing page URL
  • image_link– Main product image URL
  • availability– Stock status
  • price– Price with currency code
  • condition– Item condition
  • brand– Brand name

The schema also includesrecommended fieldssuch asgtin(GTIN/EAN/UPC),mpn(Manufacturer Part Number),sku, and variant data.

This ensures your product feed can be uploaded to Google Merchant Center for Shopping ads and free listings. See theGoogle Merchant Center Product Feed Specificationfor details.

Share to