Skip to main content

Data Format

This document provides a comprehensive overview of the data format used by Octaprice when delivering product information. Understanding the structure and content of this data is essential for integrating it into your systems and workflows.

When you download data from the Products page or receive it via Data Delivery Setup, it will be formatted according to this schema. The data is available in JSON, CSV, and JSONL formats, providing flexibility for your data analysis needs.

info

Note: While we strive to provide accurate and comprehensive data, we can only guarantee the quality of the following data points:

  • price
  • regularPrice
  • currency
  • name
  • url

Other fields may vary in completeness and accuracy due to variations in source websites.

Data Schema Overview

The data schema consists of multiple fields that provide detailed information about each product. Some fields have sub-fields to capture more granular data. We offer three different schemas depending on your subscription plan: Basic, Standard, and Enterprise.


Basic Schema

The Basic Schema includes the following fields:

  • url
    • Type: string
    • Example: https://store.example.com/product/12345
    • Description: The main URL of the product page. This is the final URL after any redirects.
    • Required: Yes
    • Notes: If the page was not reached or there is no product data, this field will still be included along with the metadata field containing dateDownloaded.
  • name
    • Type: string
    • Example: Super Widget 3000
    • Description: The name of the product as it appears on the page. The string is trimmed of any leading or trailing whitespace.
  • price
    • Type: string
    • Example: 149.99
    • Description: The current selling price of the product.
    • Standards: Should not be equal to or higher than regularPrice (if regularPrice is provided).
    • Format: No thousands separator. Use a full stop (.) as the decimal separator.
    • Notes: Violations of the above standards produce a warning.
  • regularPrice
    • Type: string
    • Example: 199.99
    • Description: The original or list price of the product before any discounts.
    • Standards: Should not be equal to or lower than price.
    • Format: No thousands separator. Use a full stop (.) as the decimal separator.
    • Notes: This field is only included if the original price is explicitly mentioned on the product page. Violations of the above standards produce a warning.
  • currency
    • Type: string
    • Example: EUR
    • Description: The currency code associated with the price, following the ISO 4217 standard.
  • availability
    • Type: string
    • Example: OutOfStock
    • Description: The availability status of the product.
    • Possible Values: InStock, OutOfStock
  • sku
    • Type: string
    • Example: SW3000-XL
    • Description: The Stock Keeping Unit (SKU), a merchant-specific identifier for the product.
  • brand
    • Type: object
    • Example:
      {
      “name”: “Super Widget”
      }
    • Description: Information about the brand associated with the product.
    • Sub-fields:
      • name:
        • Type: string
        • Example: Super Widget
        • Description: The name of the brand.
  • description
    • Type: string
    • Example: “The Super Widget 3000 is the latest innovation from WidgetCorp. Features include high-speed processing and a sleek design. Perfect for tech enthusiasts.“
    • Description: The main description of the product, containing the most useful information.
    • Notes: The string is trimmed of any leading or trailing whitespace. Line breaks are included. There is no length limit. No normalization of Unicode characters. Does not concatenate descriptions from different parts of the page.
  • reviewValue
    • Type: float
    • Example: 4.5
    • Description: The average rating value of the product.
  • categoryName
    • Type: string
    • Example: Electronics
    • Description: The name of the category to which the product belongs.
  • extractedDate
    • Type: string
    • Example: 2024-09-23T10:15:30Z
    • Description: The timestamp when the product data was downloaded.
    • Standards: Timezone: UTC. Format: ISO 8601 (YYYY-MM-DDThh:mm).

Standard Schema

The Standard Schema includes all the fields in the Basic Schema, plus:

  • mainImage
    • Type: object
    • Example:
      {
      “url”: “https://store.example.com/images/product123_main.jpg”
      }
    • Description: Details of the main image of the product.
    • Sub-fields:
      • url:
        • Type: string
        • Example: https://store.example.com/images/product123_main.jpg
        • Description: The URL of the main image.
        • Required: Yes
        • Notes: Data URLs are not allowed.
  • color
    • Type: string
    • Example: Red
    • Description: The color of the product.
  • size
    • Type: string
    • Example: Large
    • Description: The size or dimensions of the product, relevant for items like garments, shoes, etc.
  • style
    • Type: string
    • Example: Modern
    • Description: The style of the product, pertinent to items like clothing or accessories.
  • reviewCount
    • Type: integer
    • Example: 150
    • Description: The total number of reviews.
  • variants
    • Type: array of objects
    • Example:
      [
      {
      “color”: “Red”,
      “size”: “Large”
      }
      ]
    • Description: Variants of the product, such as different colors or sizes.
    • Sub-fields:
      • color:
        • Type: string
        • Example: Red
        • Description: The color variant.
      • size:
        • Type: string
        • Example: Large
        • Description: The size variant.

Enterprise Schema (Complete Version)

The Enterprise Schema includes all available fields, providing the most comprehensive data set. Below is a detailed description of each field that is exclusive to the Enterprise Schema, including examples, standards, and important notes.

  • canonicalUrl
    • Type: string
    • Example: https://store.example.com/products/widget
    • Description: The canonical URL of the product page as specified by the website.
  • currencyRaw
    • Type: string
    • Example:
    • Description: The currency symbol as it appears on the page, without any post-processing.
  • mpn
    • Type: string
    • Example: MPN-987654321
    • Description: The Manufacturer Part Number (MPN) of the product, which is consistent across different sellers.
  • gtin
    • Type: array of objects
    • Example:
      [
      {
      “type”: “gtin13",
      “value”: “0123456789012"
      }
      ]
    • Description: A list of standardized Global Trade Item Numbers (GTINs) associated with the product.
    • Sub-fields:
      • type:
        • Type: string
        • Possible Values: gtin13, gtin8, gtin14, isbn10, isbn13, ismn, issn, upc
        • Description: The type of GTIN.
      • value:
        • Type: string
        • Example: 0123456789012
        • Description: The numerical value of the GTIN.
        • Standards: Only numerical characters are allowed. Must match the pattern ^[0-9]+$. Violations produce a warning.
  • breadcrumbs
    • Type: array of objects
    • Example:
      [
      {
      “name”: “Super Widget”,
      “url”: “https://store.example.com/products/super-widget”
      }
      ]
    • Description: The breadcrumb navigation of the product page, from the most specific category to the most general.
    • Sub-fields:
      • name:
        • Type: string
        • Example: Super Widget
        • Description: The name of the category.
      • url:
        • Type: string
        • Example: https://store.example.com/products/super-widget
        • Description: The URL of the category.
info

The Enterprise Schema is fully customizable, so fields not mentioned here can be evaluated for feasibility and pricing on a case-by-case basis. Please reach out to our sales team to receive a quote.


Utilizing the Data

The structured data provided by Octaprice can be integrated into your systems for various purposes, such as:

  • Price comparison and monitoring.
  • Inventory management.
  • Market analysis.
  • Product catalog enhancement.