sync_alt

SkuSyncDocs

Introduction

SkuSync is a specialized data transformation tool designed to convert standard e-commerce product exports into formats optimized for Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines.

rocket_launchWhy use SkuSync?

Raw HTML and CSV data are noisy for AI models. SkuSync strips away presentation layers and structures your product catalog into semantic data that AI agents can "read" efficiently, reducing token usage and improving context understanding.


1Getting Started

Follow these steps to generate your first AI-ready dataset. No coding knowledge is required.

download

1. Export Data

Go to your SHOPLINE admin panel, navigate to Products, and export your catalog as "All products".

upload_file

2. Upload & Convert

Drag your CSV into SkuSync. The browser-based engine parses it instantly without server uploads.

SHOPLINE CSV Structure

SkuSync expects a standard SHOPLINE export format. Ensure your CSV file contains the following headers for optimal parsing:

products_export.csvCSV
Handle,Title,Body (HTML),Vendor,Price,Image Src
classic-tee,"Classic Cotton Tee","<p>100% organic cotton</p>",BrandX,29.99,https://.../tee.jpg
slim-jeans,"Slim Fit Denim","<p>Indigo wash</p>",BrandX,89.00,https://.../jeans.jpg
tips_and_updates
Pro Tip: Image Handling

SkuSync automatically filters for the primary product image (the first image in the list) to keep your JSON payloads lightweight for vision models.


Format Specifications

data_objectJSON Schema

The JSON Schema output provides a strict type definition for your product data. This is essential when using "Function Calling" or "Tools" with OpenAI's GPT-4 or Anthropic's Claude, ensuring the model generates valid parameters.

{ "type": "object", "properties": { "title": { "type": "string" }, "price": { "type": "number" } ... } }

descriptionllms.txt

Following the proposed/llms.txtstandard, this format uses simplified Markdown to present content. It strips HTML tags, script blocks, and CSS classes, leaving only the semantic content relevant for training or context windows.

# Product Catalog Context
## Classic Cotton Tee
ID: 10234
Price: $29.99
Description: 100% organic cotton, pre-shrunk, available in earth tones.
## Slim Fit Denim
ID: 10235
...

codeNDJSON (Newline Delimited JSON)

NDJSON is the preferred format for bulk data ingestion into vector databases like Pinecone or Weaviate. Each line is a standalone valid JSON object, allowing for stream processing without loading the entire dataset into memory.

3Usage Guide

Learn how to integrate SkuSync outputs into your AI-powered workflows and applications.

smart_toyOpenAI Integration

Use the generated JSON Schema with OpenAI's Function Calling for structured product queries.

// Using JSON Schema with OpenAI Function Calling
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Find red shoes under $50" }],
  functions: [{
    name: "search_products",
    parameters: jsonSchema // Use generated JSON Schema
  }]
});

storageVector Database Import

Import NDJSON output into Pinecone or other vector databases for semantic search.

// Importing NDJSON to Pinecone
const fs = require('fs');
const ndjsonLines = fs.readFileSync('products.ndjson', 'utf-8').split('\n');

for (const line of ndjsonLines) {
  if (line.trim()) {
    const product = JSON.parse(line);
    await pineconeIndex.upsert({
      vectors: [{
        id: product.id,
        metadata: product,
        values: await embed(product.triples.join(' '))
      }]
    });
  }
}

hubRAG Pipeline Setup

Build a Retrieval-Augmented Generation pipeline using llms.txt as context.

// RAG Pipeline with llms.txt context
async function queryProductCatalog(userQuery) {
  // 1. Retrieve relevant context from llms.txt
  const context = retrieveContext(userQuery, llmsTxtContent);

  // 2. Augment query with context
  const prompt = `Context:
${context}

Question: ${userQuery}`;

  // 3. Generate response
  return await llm.generate(prompt);
}

4Best Practices

checklistData Quality Checklist

  • check_circleEnsure CSV is UTF-8 encoded for proper character handling
  • check_circleVerify required columns (Handle, Title*, Vendor, Tags, Collections)
  • check_circleCheck for duplicate SKUs to avoid data conflicts
  • check_circleValidate image URLs are accessible and properly formatted
  • check_circleRemove any unnecessary HTML from product descriptions

datasetLarge Dataset Strategy

  • tips_and_updatesSplit files larger than 10,000 products for better performance
  • tips_and_updatesUse batch mode for multiple files to merge results efficiently
  • tips_and_updatesMonitor browser memory usage when processing large datasets
  • tips_and_updatesConsider using NDJSON for streaming large datasets to databases

integration_instructionsIntegration Best Practices

  • tips_and_updatesStore NDJSON in version control for data tracking and rollback
  • tips_and_updatesCreate automated CI/CD pipelines for regular data updates
  • tips_and_updatesSet up scheduled syncs to keep AI systems updated with latest products
  • tips_and_updatesDeploy llms.txt to your website root for AI crawler discovery

Troubleshooting

What's the maximum file size supported?expand_more
SkuSync supports CSV files up to 50MB per file. For larger datasets, use the batch mode to split your data into multiple files and process them together. The results will be automatically merged.
How are special characters in product titles handled?expand_more
The parser correctly handles quoted fields containing commas, newlines, and special characters. Ensure your CSV uses proper quoting (double quotes) around fields that contain special characters.
How can I improve conversion speed for large files?expand_more
For optimal performance with large files:
• Use a modern browser (Chrome or Edge recommended)
• Close unnecessary browser tabs to free up memory
• Ensure your device has sufficient RAM (4GB+ recommended)
• Use batch mode for multiple smaller files instead of one large file
Is my data sent to any server?expand_more
No.All processing happens locally in your browser using JavaScript. Your data never leaves your device. You can verify this by disconnecting from the internet after loading the page—the tool will continue to work offline.
My file is not parsing correctlyexpand_more
Ensure you are using the default CSV encoding (UTF-8). Some Excel exports use UTF-16LE which may cause issues. Try opening your CSV in a text editor and saving it specifically as UTF-8.
Images are missing in the outputexpand_more
SkuSync looks for theImage Srccolumn. If your export uses a different header (e.g., from a custom app), rename the column header toImage Srcbefore uploading.
Can I customize the triples extraction?expand_more
Yes! Use the "Customize Output Configuration" panel to select which fields should be extracted as semantic triples. You can include or exclude Vendor, Tags, Collections, Title, and Subtitle fields based on your needs.
What browsers are supported?expand_more
SkuSync works on all modern browsers including Chrome, Firefox, Safari, and Edge. For the best performance, we recommend using the latest version of Chrome or Edge with JavaScript enabled.