← coriro.org

Documentation

v2.0.0

Install

pip install coriro

This installs the core package with NumPy and Pillow. Optional features require extra dependencies. If CNN dependencies are missing, smooth=True raises an install hint. If text or accent dependencies are unavailable, those passes return None.

To install everything upfront:

pip install coriro[all]
Text detection: Coriro uses onnxtr[cpu] (neural text detection) by default and falls back to pytesseract when needed. The fallback requires the tesseract-ocr system binary (brew install tesseract / apt install tesseract-ocr).

Requires Python 3.10+. No GPU needed.

Quick start

from coriro import measure

m = measure("image.png")

# Get results in the format your pipeline needs
m.to_json()    # Raw measurement JSON (schema)
m.to_xml()     # XML for context blocks
m.to_prompt()  # Human-readable natural language

With all features enabled:

from coriro import measure, GridSize

m = measure(
    "image.png",
    grid=GridSize.GRID_3X3,      # 9-region spatial grid
    smooth=True,                 # CNN pixel stabilization (requires torch + timm)
    include_text=True,           # extract text foreground colors
    include_accents=True,        # detect solid accent regions (buttons, icons)
    max_output_colors=8,        # larger palette
)

# Consolidated JSON with spatial data
from coriro.runtime import to_tool_output
json_str = to_tool_output(m, consolidated=True, include_spatial=True)

API

measure()

The primary entry point. Accepts a file path, a pathlib.Path, or a NumPy uint8 RGB array. Returns a ColorMeasurement.

def measure(
    image: str | Path | NDArray[np.uint8],
    *,
    palette_size: int = 10,
    grid: GridSize = GridSize.GRID_2X2,
    colors_per_region: int = 3,
    include_hash: bool = True,
    max_pixels: int = 0,
    use_mode: bool = True,
    consolidate: bool = True,
    delta_e_threshold: float = 0.03,
    max_output_colors: int = 5,
    include_text: bool = False,
    include_accents: bool = False,
    accent_min_pixels: int = 2000,
    smooth: bool = False,
) -> ColorMeasurement
ParameterTypeDefaultDescription
imagestr | Path | ndarray-Image file path or uint8 RGB array (H, W, 3)
max_output_colorsint5Final palette size after consolidation
gridGridSizeGRID_2X2Spatial grid: GRID_2X2 (4), GRID_3X3 (9), or GRID_4X4 (16 regions)
max_pixelsint0Additional downsample cap. 0 = no extra limit. Coriro internally downsamples to 400px max dimension for palette, spatial, and chroma passes
include_textboolFalseText foreground color extraction via OnnxTR (primary) or Tesseract (fallback)
include_accentsboolFalseSolid region detection for UI accent colors. Requires scipy
smoothboolFalseCNN pixel smoothing before extraction. Requires torch + timm

Advanced parameters (defaults are tuned for most use cases):

ParameterTypeDefaultDescription
palette_sizeint10Colors to extract before consolidation
colors_per_regionint3Colors per spatial region
use_modeboolTrueMode-based extraction (exact pixel counts). Set False for k-means (better for photos/gradients)
consolidateboolTrueMerge near-identical colors via ΔE distance in OKLab
delta_e_thresholdfloat0.03ΔE threshold for collapsing similar colors
accent_min_pixelsint2000Minimum pixel count for accent regions
include_hashboolTrueInclude truncated SHA-256 prefix of the source pixels (16 hex chars)

Convenience methods on ColorMeasurement

MethodReturnsDescription
.to_json(indent=2)strJSON string. Set indent=None for compact output
.to_xml(include_spatial=True)strXML context block
.to_prompt(include_spatial=True)strHuman-readable natural language
.to_dict()dictPython dict of the full measurement
.from_json(json_str)ColorMeasurementDeserialize from JSON string (classmethod)
.from_dict(data)ColorMeasurementDeserialize from dict (classmethod)
The convenience methods .to_xml() and .to_prompt() include spatial data by default. The underlying serializer functions (to_context_block, to_system_prompt) default to include_spatial=False. Use the serializer functions directly when you need fine-grained control.

Types & color space

All types are frozen dataclasses (immutable). Primary types are importable from coriro directly. Additional types are available from coriro.schema.

from coriro import (
    ColorMeasurement,      # Top-level result
    OKLCHColor,            # Color in OKLCH space
    WeightedColor,         # Color + relative weight
    GridSize,              # Spatial grid enum
    SpatialBins,           # Spatial distribution
    RegionColor,           # Per-region color data
)

# Additional types from coriro.schema
from coriro.schema import (
    MeasurementMeta,           # Measurement scope and thresholds
    TextColorMeasurement,      # Text color result
    AccentMeasurement,         # Accent region result
    AccentRegion,              # Single accent region
)

OKLCHColor

OKLCH is a cylindrical form of OKLab designed for perceptual uniformity — ΔE thresholds mean the same thing across the entire color space. This is the canonical color representation throughout Coriro.

FieldTypeDescription
LfloatLightness (0.0 = black, 1.0 = white)
CfloatChroma (0.0 = gray, ~0.32 = max saturation in sRGB gamut)
Hfloat | NoneHue in degrees (0–360). None for achromatic colors (C < 0.02)
sample_hexstr | NoneHex of a real pixel from the cluster. Use this for implementation — it exists in the image

Properties: .hex returns sample_hex if available, otherwise computes from L/C/H. .centroid_hex always computes from the OKLCH centroid (cluster average — may not correspond to any real pixel). .is_achromatic is true when chroma is below 0.02.

WeightedColor

FieldTypeDescription
colorOKLCHColorThe color value
weightfloatRelative dominance (0.0-1.0). Weights in a palette sum to 1.0

GridSize

ValueRegions
GRID_2X24 regions (default)
GRID_3X39 regions
GRID_4X416 regions

Regions are identified in reading order (left-to-right, top-to-bottom): R1C1, R1C2, R2C1, R2C2 for a 2x2 grid.

ColorMeasurement

The top-level result container returned by measure().

FieldTypeDescription
dominantOKLCHColorThe single most prominent color in the image
palettetuple[WeightedColor, ...]Ranked colors, most to least dominant. Weights sum to 1.0
spatialSpatialBinsGrid-based spatial color distribution (always computed)
measurementMeasurementMeta | NoneClosed-world metadata — scope, thresholds, completeness
text_colorsTextColorMeasurement | NoneText foreground colors (when include_text=True)
accent_regionsAccentMeasurement | NoneSolid accent regions (when include_accents=True)
image_hashstr | NoneTruncated SHA-256 prefix of the source pixels (16 hex chars)

TextColorMeasurement

Present when include_text=True. Access text foreground colors via m.text_colors.colors — a tuple[WeightedColor, ...]. Remaining fields (scope, coverage, ocr_engine) are serialization metadata handled automatically by the serializers.

AccentMeasurement

Present when include_accents=True. Access detected regions via m.accent_regions.regions — a tuple[AccentRegion, ...]. Each AccentRegion has .color (OKLCHColor), .pixels (int), and .location (grid region ID or None). Remaining fields are serialization metadata.

MeasurementMeta

Serialized automatically by all output formats. Tells the receiving model the measurement scope, coverage thresholds, and palette cap — so it knows the palette is complete above those thresholds.

Serialization

Three serialization targets for different pipeline architectures. All formats produce structured text that can be passed alongside images in any multimodal pipeline — as prompt context, auxiliary input, or structured features. Compatible with Claude, GPT, Gemini, Llama, Qwen-VL, and custom architectures. How you integrate the output is your domain.

Each serializer is available as a standalone function for fine-grained control.

Tool output (JSON)

For VLMs with tool/function calling support. Recommended for production.

from coriro.runtime import to_tool_output

# Consolidated — hex + OKLCH + weight per color
json_str = to_tool_output(m, consolidated=True)

# Compact — token-constrained pipelines
json_str = to_tool_output(m, compact=True)

# Hex-only — minimal output, implementation-ready
json_str = to_tool_output(m, hex_only=True)

# With spatial data
json_str = to_tool_output(m, consolidated=True, include_spatial=True)
ParameterDefaultDescription
formatJSONJSON or JSON_PRETTY
include_spatialFalseInclude spatial region data
compact_colorsFalseCompact color notation (L0.85/C0.15/H220)
compactFalseMinimal output. Implies compact_colors
include_hexFalseInclude hex values alongside OKLCH
hex_onlyFalseHex values only. Implies compact
image_idNoneImage identifier for multi-image contexts
consolidatedFalseDesign-friendly format: hex + OKLCH + weight per color

Example output from a real website screenshot:

Dominant

#0D2330

Palette

#0D2330 60%
#0A1B24 36%
#E8654B 3%
#6E8F63 <1%
#A88BC0 <1%

Text Colors

#EBE6E2 51%
#8BAF7C 23%
#D4A843 9%
#556B73 6%
#A88BC0 6%
#E8654B 5%

Accent Regions

#253942 57,507px @ R1C2
#52514B 19,864px @ R1C2
#142125 14,175px @ R2C2
#8BAF7C 7,566px @ R1C2
#EBE6E2 2,210px @ R1C1

The same measurement in all three serialization formats:

{
  "tool": "coriro_color_measurement",
  "measurement": {
    "version": "1.0",
    "scope": "area_dominant_surfaces",
    "coverage": "complete",
    "thresholds": { "min_area_pct": 1.0, "delta_e_collapse": 0.03 },
    "palette_cap": 5,
    "spatial_role": "diagnostic",
    "perceptual_supplements": 5
  },
  "dominant": { "hex": "#0D2330", "oklch": "L0.25/C0.04/H237" },
  "palette": [
    { "hex": "#0D2330", "oklch": "L0.25/C0.04/H237", "weight": 0.6 },
    { "hex": "#0A1B24", "oklch": "L0.21/C0.03/H234", "weight": 0.36 },
    { "hex": "#E8654B", "oklch": "L0.66/C0.17/H33", "weight": 0.03 },
    { "hex": "#6E8F63", "oklch": "L0.61/C0.07/H138", "weight": 0.0 },
    { "hex": "#A88BC0", "oklch": "L0.68/C0.08/H309", "weight": 0.0 }
  ],
  "spatial": {
    "R1C1": { "hex": "#0A1B24", "coverage": 0.50 },
    "R1C2": { "hex": "#0F242D", "coverage": 0.42 },
    "R2C1": { "hex": "#0D2330", "coverage": 0.64 },
    "R2C2": { "hex": "#0D2330", "coverage": 0.81 }
  },
  "text_colors": {
    "scope": "glyph_foreground",
    "coverage": "complete",
    "thresholds": { "min_area_pct": 0.5 },
    "ocr_engine": "onnxtr",
    "colors": [
      { "hex": "#EBE6E2", "oklch": "L0.93/C0.01", "weight": 0.51 },
      { "hex": "#8BAF7C", "oklch": "L0.71/C0.08/H137", "weight": 0.23 },
      { "hex": "#D4A843", "oklch": "L0.75/C0.13/H85", "weight": 0.09 },
      { "hex": "#556B73", "oklch": "L0.51/C0.03/H222", "weight": 0.06 },
      { "hex": "#A88BC0", "oklch": "L0.68/C0.08/H309", "weight": 0.06 },
      { "hex": "#E8654B", "oklch": "L0.66/C0.17/H33", "weight": 0.05 }
    ]
  },
  "accent_regions": {
    "scope": "solid_accent_regions",
    "coverage": "complete",
    "thresholds": { "min_pixels": 2000 },
    "regions": [
      { "hex": "#253942", "oklch": "L0.33/C0.03/H228", "pixels": 57507, "location": "R1C2" },
      { "hex": "#52514B", "oklch": "L0.43/C0.01", "pixels": 19864, "location": "R1C2" },
      { "hex": "#142125", "oklch": "L0.24/C0.02", "pixels": 14175, "location": "R2C2" },
      { "hex": "#8BAF7C", "oklch": "L0.71/C0.08/H137", "pixels": 7566, "location": "R1C2" },
      { "hex": "#EBE6E2", "oklch": "L0.93/C0.01", "pixels": 2210, "location": "R1C1" }
    ]
  }
}

Context block (XML / JSON / Markdown)

For inline injection alongside images. XML works well with Claude and Qwen.

from coriro.runtime import to_context_block, BlockFormat

xml_str = to_context_block(m, format=BlockFormat.XML)
json_str = to_context_block(m, format=BlockFormat.JSON)
md_str = to_context_block(m, format=BlockFormat.MARKDOWN)
ParameterDefaultDescription
formatXMLXML, JSON, or MARKDOWN
include_spatialFalseInclude spatial region data
tag_name"color_measurement"Root tag/heading name

System prompt (natural language)

Human-readable text for system prompt injection.

from coriro.runtime import to_system_prompt, SerializerFormat

text = to_system_prompt(m, format=SerializerFormat.NATURAL)
text = to_system_prompt(m, format=SerializerFormat.JSON)
ParameterDefaultDescription
formatNATURALNATURAL, JSON, or JSON_PRETTY
include_spatialFalseInclude spatial region breakdown
preambleTrueInclude explanatory preamble

Advanced integration

The simplest use is appending Coriro output as text alongside the image in a VLM prompt. But the schema supports integration at any level of your architecture.

Measurements are normalized numerical values (L, C, H in perceptually uniform ranges, weights summing to 1.0) with explicit thresholds. Depending on your pipeline, you can:

  • Convert palette values into auxiliary feature vectors for multimodal fusion
  • Map measurements through a projection layer into your model's embedding space
  • Inject color features as auxiliary tokens alongside vision embeddings
  • Concatenate structured color vectors with image embeddings before downstream processing
  • Use measurements as conditioning signals for adapters, LoRA modules, or FiLM-style modulation layers
  • Feed data into custom embedding or feature store pipelines

Because Coriro exposes explicit perceptual thresholds (ΔE collapse distance, coverage floor), these measurements function as deterministic, reproducible conditioning signals — not heuristic approximations that vary with prompt phrasing.

Coriro does not prescribe how the data is consumed. It provides structured, perceptually grounded color measurements. Whether you integrate them as prompt metadata, auxiliary embeddings, model conditioning, or pipeline features is implementation-specific.

Optional features

Each optional pass is independent and adds its own field to the result without affecting the core palette. Enable any combination via measure() parameters.

Text colors

Extracts foreground text colors via OnnxTR (neural text detection, primary) or Tesseract OCR (fallback). Install with pip install coriro[text].

m = measure("page.png", include_text=True)
m.text_colors            # TextColorMeasurement or None
m.text_colors.colors     # tuple of WeightedColor

Accent regions

Detects solid-color UI regions (buttons, icons, badges) that may be too small for the area-dominant palette but are visually significant. Requires scipy.

m = measure("ui.png", include_accents=True, accent_min_pixels=2000)
m.accent_regions             # AccentMeasurement or None
m.accent_regions.regions     # tuple of AccentRegion (color, pixels, location)

CNN pixel smoothing

Stabilizes color surfaces before extraction — reducing noise from compression artifacts, gradient banding, and anti-aliasing. Requires torch + timm (pip install coriro[cnn]).

m = measure("image.png", smooth=True)
CNN smoothing is off by default because it requires torch + timm. If your environment already includes both, consider enabling smooth=True for improved measurement quality.

Latency control

Coriro internally downsamples to 400px max dimension (NEAREST resampling) for palette, spatial, and chroma passes. Text and accent detection run at full resolution. For additional control, set max_pixels to apply a user-specified cap before processing.

m = measure("image.png", max_pixels=1_500_000)

Hosted API

Coriro is a Python library. If your frontend is built in a different stack (Next.js, React, etc.), you can wrap Coriro in an HTTP endpoint and call it from your application.

Flask

from flask import Flask, request, jsonify
from coriro import measure
import tempfile, os

app = Flask(__name__)

@app.route("/measure", methods=["POST"])
def measure_image():
    file = request.files["image"]
    tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
    file.save(tmp.name)

    m = measure(
        tmp.name,
        include_text=request.args.get("text", "false") == "true",
        include_accents=request.args.get("accents", "false") == "true",
    )
    os.unlink(tmp.name)

    return jsonify(m.to_dict())

FastAPI

from fastapi import FastAPI, UploadFile
from coriro import measure
import tempfile, os

app = FastAPI()

@app.post("/measure")
async def measure_image(
    image: UploadFile,
    text: bool = False,
    accents: bool = False,
):
    tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
    tmp.write(await image.read())
    tmp.close()

    m = measure(
        tmp.name,
        include_text=text,
        include_accents=accents,
    )
    os.unlink(tmp.name)

    return m.to_dict()

Calling from a web frontend

Once the Python server is running, call it from any frontend:

// JavaScript / TypeScript
const form = new FormData();
form.append("image", file);

const res = await fetch("https://your-server/measure?text=true", {
  method: "POST",
  body: form,
});

const measurement = await res.json();
// measurement.dominant, measurement.palette, measurement.text_colors, ...

This pattern works for any architecture: a Next.js frontend calling a Python backend on a separate server, a serverless function on AWS Lambda, or a container on Fly/Railway. Coriro runs wherever Python runs.

Use cases

VLM color grounding

Measured palettes as structured data alongside the image. Vision encoders lose color precision during patching — published benchmarks show significant color-task accuracy gaps across many models. Structured measurement data bypasses this limitation.

Screenshot-to-code

Measured hex values and spatial color distribution for code generation. Models get measured colors from text instead of inferring them from vision-encoded patches.

Product color metadata

Verifiable weighted palettes attached to product listings. Enables perceptual color search across catalogs, cross-SKU consistency checks via ΔE comparison, and early flagging of color mismatches that drive returns.

Design system compliance

Measure rendered screenshots against design token definitions. Perceptual ΔE distance catches real cross-platform drift that hex comparison misses.

Accessibility contrast auditing

Contrast computed from rendered pixels in perceptually uniform OKLCH. Covers text over gradients, background images, and overlapping elements that DOM-based scanners cannot evaluate.

Color-aware asset search

Images indexed by weighted palette and spatial distribution. Retrieval by color proportion, region placement, and perceptual similarity — not flat hex matching.

Design decisions

Why OKLCH, not HSL or RGB?

HSL is not perceptually uniform. Two colors with the same "S" value can look different in perceived saturation. OKLCH was designed for perceptual uniformity — ΔE thresholds mean the same thing across the entire color space. This makes consolidation (collapsing similar colors) reliable: a threshold of 0.03 produces consistent results whether comparing blues, reds, or grays.

Why sample_hex and centroid_hex?

K-means centroids are averages. The centroid of a cluster containing #FF0000 and #FF0200 might be #FF0100 — a color that doesn't exist in the image. For perceptual reasoning (is this warm? saturated? dark?), the centroid is fine. For CSS implementation, you need a real pixel value. sample_hex is the nearest actual pixel to the centroid within the cluster.

Why closed-world metadata?

Without MeasurementMeta, a VLM treats the palette as advisory: "here are some colors, there might be more." The model fills gaps with vision estimates, which defeats the purpose. With the metadata block stating "coverage": "complete" above specific thresholds, the palette becomes authoritative. If green isn't listed, the model knows there's no green above 1% area — it doesn't need to guess.

Why achromatic/chromatic merge guard?

A dark red (L=0.20, C=0.04, H=30) and a near-black (L=0.20, C=0.01) can have a small ΔE. Without the guard, consolidation would merge them — collapsing a dark red accent into a generic black. The achromatic/chromatic boundary prevents this, preserving design intent.

Why mode-based extraction over k-means by default?

Screenshots are synthetic images with exact pixel values. A background color is literally the same RGB triplet across thousands of pixels. Mode-based counting (frequency of exact values) finds these precisely. K-means would average them into a slightly-off centroid. For photos and gradients where exact repetition is rare, k-means (use_mode=False) is more robust.

Why is spatial data off by default in serialization?

For most web screenshots, the global palette carries sufficient signal — the model can see layout from the image itself. Spatial bins add tokens without proportional value in the common case. They're always computed (so the data is there) but serialization is opt-in for when layout-dependent color placement actually matters.