Coriro Documentation

Install

pip install coriro

This installs the core package with NumPy and Pillow. Optional features require extra dependencies. If CNN dependencies are missing, smooth=True raises an install hint. If text or accent dependencies are unavailable, those passes return None.

To install everything upfront:

pip install coriro[all]

Text detection: Coriro uses onnxtr[cpu] (neural text detection) by default and falls back to pytesseract when needed. The fallback requires the tesseract-ocr system binary (brew install tesseract / apt install tesseract-ocr).

Requires Python 3.10+. No GPU needed.

Quick start

from coriro import measure

m = measure("image.png")

# Get results in the format your pipeline needs
m.to_json()    # Raw measurement JSON (schema)
m.to_xml()     # XML for context blocks
m.to_prompt()  # Human-readable natural language

With all features enabled:

from coriro import measure, GridSize

m = measure(
    "image.png",
    grid=GridSize.GRID_3X3,      # 9-region spatial grid
    smooth=True,                 # CNN pixel stabilization (requires torch + timm)
    include_text=True,           # extract text foreground colors
    include_accents=True,        # detect solid accent regions (buttons, icons)
    max_output_colors=8,        # larger palette
)

# Consolidated JSON with spatial data
from coriro.runtime import to_tool_output
json_str = to_tool_output(m, consolidated=True, include_spatial=True)

API

measure()

The primary entry point. Accepts a file path, a pathlib.Path, or a NumPy uint8 RGB array. Returns a ColorMeasurement.

def measure(
    image: str | Path | NDArray[np.uint8],
    *,
    palette_size: int = 10,
    grid: GridSize = GridSize.GRID_2X2,
    colors_per_region: int = 3,
    include_hash: bool = True,
    max_pixels: int = 0,
    use_mode: bool = True,
    consolidate: bool = True,
    delta_e_threshold: float = 0.03,
    max_output_colors: int = 5,
    include_text: bool = False,
    include_accents: bool = False,
    accent_min_pixels: int = 2000,
    smooth: bool = False,
) -> ColorMeasurement

Parameter	Type	Default	Description
image	str \| Path \| ndarray	-	Image file path or uint8 RGB array (H, W, 3)
max_output_colors	int	5	Final palette size after consolidation
grid	GridSize	GRID_2X2	Spatial grid: GRID_2X2 (4), GRID_3X3 (9), or GRID_4X4 (16 regions)
max_pixels	int	0	Additional downsample cap. 0 = no extra limit. Coriro internally downsamples to 400px max dimension for palette, spatial, and chroma passes
include_text	bool	False	Text foreground color extraction via OnnxTR (primary) or Tesseract (fallback)
include_accents	bool	False	Solid region detection for UI accent colors. Requires scipy
smooth	bool	False	CNN pixel smoothing before extraction. Requires torch + timm

Advanced parameters (defaults are tuned for most use cases):

Parameter	Type	Default	Description
palette_size	int	10	Colors to extract before consolidation
colors_per_region	int	3	Colors per spatial region
use_mode	bool	True	Mode-based extraction (exact pixel counts). Set False for k-means (better for photos/gradients)
consolidate	bool	True	Merge near-identical colors via ΔE distance in OKLab
delta_e_threshold	float	0.03	ΔE threshold for collapsing similar colors
accent_min_pixels	int	2000	Minimum pixel count for accent regions
include_hash	bool	True	Include truncated SHA-256 prefix of the source pixels (16 hex chars)

Convenience methods on ColorMeasurement

Method	Returns	Description
.to_json(indent=2)	str	JSON string. Set indent=None for compact output
.to_xml(include_spatial=True)	str	XML context block
.to_prompt(include_spatial=True)	str	Human-readable natural language
.to_dict()	dict	Python dict of the full measurement
.from_json(json_str)	ColorMeasurement	Deserialize from JSON string (classmethod)
.from_dict(data)	ColorMeasurement	Deserialize from dict (classmethod)

The convenience methods .to_xml() and .to_prompt() include spatial data by default. The underlying serializer functions (to_context_block, to_system_prompt) default to include_spatial=False. Use the serializer functions directly when you need fine-grained control.

Types & color space

All types are frozen dataclasses (immutable). Primary types are importable from coriro directly. Additional types are available from coriro.schema.

from coriro import (
    ColorMeasurement,      # Top-level result
    OKLCHColor,            # Color in OKLCH space
    WeightedColor,         # Color + relative weight
    GridSize,              # Spatial grid enum
    SpatialBins,           # Spatial distribution
    RegionColor,           # Per-region color data
)

# Additional types from coriro.schema
from coriro.schema import (
    MeasurementMeta,           # Measurement scope and thresholds
    TextColorMeasurement,      # Text color result
    AccentMeasurement,         # Accent region result
    AccentRegion,              # Single accent region
)

OKLCHColor

OKLCH is a cylindrical form of OKLab designed for perceptual uniformity — ΔE thresholds mean the same thing across the entire color space. This is the canonical color representation throughout Coriro.

Field	Type	Description
L	float	Lightness (0.0 = black, 1.0 = white)
C	float	Chroma (0.0 = gray, ~0.32 = max saturation in sRGB gamut)
H	float \| None	Hue in degrees (0–360). None for achromatic colors (C < 0.02)
sample_hex	str \| None	Hex of a real pixel from the cluster. Use this for implementation — it exists in the image

Properties: .hex returns sample_hex if available, otherwise computes from L/C/H. .centroid_hex always computes from the OKLCH centroid (cluster average — may not correspond to any real pixel). .is_achromatic is true when chroma is below 0.02.

WeightedColor

Field	Type	Description
color	OKLCHColor	The color value
weight	float	Relative dominance (0.0-1.0). Weights in a palette sum to 1.0

GridSize

Value	Regions
GRID_2X2	4 regions (default)
GRID_3X3	9 regions
GRID_4X4	16 regions

Regions are identified in reading order (left-to-right, top-to-bottom): R1C1, R1C2, R2C1, R2C2 for a 2x2 grid.

ColorMeasurement

The top-level result container returned by measure().

Field	Type	Description
dominant	OKLCHColor	The single most prominent color in the image
palette	tuple[WeightedColor, ...]	Ranked colors, most to least dominant. Weights sum to 1.0
spatial	SpatialBins	Grid-based spatial color distribution (always computed)
measurement	MeasurementMeta \| None	Closed-world metadata — scope, thresholds, completeness
text_colors	TextColorMeasurement \| None	Text foreground colors (when include_text=True)
accent_regions	AccentMeasurement \| None	Solid accent regions (when include_accents=True)
image_hash	str \| None	Truncated SHA-256 prefix of the source pixels (16 hex chars)

TextColorMeasurement

Present when include_text=True. Access text foreground colors via m.text_colors.colors — a tuple[WeightedColor, ...]. Remaining fields (scope, coverage, ocr_engine) are serialization metadata handled automatically by the serializers.

AccentMeasurement

Present when include_accents=True. Access detected regions via m.accent_regions.regions — a tuple[AccentRegion, ...]. Each AccentRegion has .color (OKLCHColor), .pixels (int), and .location (grid region ID or None). Remaining fields are serialization metadata.

MeasurementMeta

Serialized automatically by all output formats. Tells the receiving model the measurement scope, coverage thresholds, and palette cap — so it knows the palette is complete above those thresholds.

Serialization

Three serialization targets for different pipeline architectures. All formats produce structured text that can be passed alongside images in any multimodal pipeline — as prompt context, auxiliary input, or structured features. Compatible with Claude, GPT, Gemini, Llama, Qwen-VL, and custom architectures. How you integrate the output is your domain.

Each serializer is available as a standalone function for fine-grained control.

Tool output (JSON)

For VLMs with tool/function calling support. Recommended for production.

from coriro.runtime import to_tool_output

# Consolidated — hex + OKLCH + weight per color
json_str = to_tool_output(m, consolidated=True)

# Compact — token-constrained pipelines
json_str = to_tool_output(m, compact=True)

# Hex-only — minimal output, implementation-ready
json_str = to_tool_output(m, hex_only=True)

# With spatial data
json_str = to_tool_output(m, consolidated=True, include_spatial=True)

Parameter	Default	Description
format	JSON	JSON or JSON_PRETTY
include_spatial	False	Include spatial region data
compact_colors	False	Compact color notation (L0.85/C0.15/H220)
compact	False	Minimal output. Implies compact_colors
include_hex	False	Include hex values alongside OKLCH
hex_only	False	Hex values only. Implies compact
image_id	None	Image identifier for multi-image contexts
consolidated	False	Design-friendly format: hex + OKLCH + weight per color

Example output from a real website screenshot:

Dominant

#0D2330

Palette

#0D2330 60%

#0A1B24 36%

#E8654B 3%

#6E8F63 <1%

#A88BC0 <1%

Text Colors

#EBE6E2 51%

#8BAF7C 23%

#D4A843 9%

#556B73 6%

#A88BC0 6%

#E8654B 5%

Accent Regions

#253942 57,507px @ R1C2

#52514B 19,864px @ R1C2

#142125 14,175px @ R2C2

#8BAF7C 7,566px @ R1C2

#EBE6E2 2,210px @ R1C1

The same measurement in all three serialization formats:

{
  "tool": "coriro_color_measurement",
  "measurement": {
    "version": "1.0",
    "scope": "area_dominant_surfaces",
    "coverage": "complete",
    "thresholds": { "min_area_pct": 1.0, "delta_e_collapse": 0.03 },
    "palette_cap": 5,
    "spatial_role": "diagnostic",
    "perceptual_supplements": 5
  },
  "dominant": { "hex": "#0D2330", "oklch": "L0.25/C0.04/H237" },
  "palette": [
    { "hex": "#0D2330", "oklch": "L0.25/C0.04/H237", "weight": 0.6 },
    { "hex": "#0A1B24", "oklch": "L0.21/C0.03/H234", "weight": 0.36 },
    { "hex": "#E8654B", "oklch": "L0.66/C0.17/H33", "weight": 0.03 },
    { "hex": "#6E8F63", "oklch": "L0.61/C0.07/H138", "weight": 0.0 },
    { "hex": "#A88BC0", "oklch": "L0.68/C0.08/H309", "weight": 0.0 }
  ],
  "spatial": {
    "R1C1": { "hex": "#0A1B24", "coverage": 0.50 },
    "R1C2": { "hex": "#0F242D", "coverage": 0.42 },
    "R2C1": { "hex": "#0D2330", "coverage": 0.64 },
    "R2C2": { "hex": "#0D2330", "coverage": 0.81 }
  },
  "text_colors": {
    "scope": "glyph_foreground",
    "coverage": "complete",
    "thresholds": { "min_area_pct": 0.5 },
    "ocr_engine": "onnxtr",
    "colors": [
      { "hex": "#EBE6E2", "oklch": "L0.93/C0.01", "weight": 0.51 },
      { "hex": "#8BAF7C", "oklch": "L0.71/C0.08/H137", "weight": 0.23 },
      { "hex": "#D4A843", "oklch": "L0.75/C0.13/H85", "weight": 0.09 },
      { "hex": "#556B73", "oklch": "L0.51/C0.03/H222", "weight": 0.06 },
      { "hex": "#A88BC0", "oklch": "L0.68/C0.08/H309", "weight": 0.06 },
      { "hex": "#E8654B", "oklch": "L0.66/C0.17/H33", "weight": 0.05 }
    ]
  },
  "accent_regions": {
    "scope": "solid_accent_regions",
    "coverage": "complete",
    "thresholds": { "min_pixels": 2000 },
    "regions": [
      { "hex": "#253942", "oklch": "L0.33/C0.03/H228", "pixels": 57507, "location": "R1C2" },
      { "hex": "#52514B", "oklch": "L0.43/C0.01", "pixels": 19864, "location": "R1C2" },
      { "hex": "#142125", "oklch": "L0.24/C0.02", "pixels": 14175, "location": "R2C2" },
      { "hex": "#8BAF7C", "oklch": "L0.71/C0.08/H137", "pixels": 7566, "location": "R1C2" },
      { "hex": "#EBE6E2", "oklch": "L0.93/C0.01", "pixels": 2210, "location": "R1C1" }
    ]
  }
}

<color_measurement version="1.0" source="coriro">
  <dominant L="0.245" C="0.037" H="236.9"/>
  <palette>
    <color L="0.245" C="0.037" H="236.9" weight="0.600"/>
    <color L="0.212" C="0.029" H="233.5" weight="0.361"/>
    <color L="0.665" C="0.168" H="32.9" weight="0.035"/>
    <color L="0.613" C="0.074" H="138.0" weight="0.004"/>
    <color L="0.682" C="0.083" H="309.2" weight="0.000"/>
  </palette>
</color_measurement>

## Coriro Color Measurement

**Dominant Color:** Dark blue (L=0.25, C=0.04, H=237°)

**Palette:**
1. Dark blue (L=0.25, C=0.04, H=237°) (60%)
2. Dark blue (L=0.21, C=0.03, H=234°) (36%)
3. Orange (L=0.66, C=0.17, H=33°) (3%)
4. Green (L=0.61, C=0.07, H=138°) (<1%)
5. Purple (L=0.68, C=0.08, H=309°) (<1%)

Context block (XML / JSON / Markdown)

For inline injection alongside images. XML works well with Claude and Qwen.

from coriro.runtime import to_context_block, BlockFormat

xml_str = to_context_block(m, format=BlockFormat.XML)
json_str = to_context_block(m, format=BlockFormat.JSON)
md_str = to_context_block(m, format=BlockFormat.MARKDOWN)

Parameter	Default	Description
format	XML	XML, JSON, or MARKDOWN
include_spatial	False	Include spatial region data
tag_name	"color_measurement"	Root tag/heading name

System prompt (natural language)

Human-readable text for system prompt injection.

from coriro.runtime import to_system_prompt, SerializerFormat

text = to_system_prompt(m, format=SerializerFormat.NATURAL)
text = to_system_prompt(m, format=SerializerFormat.JSON)

Parameter	Default	Description
format	NATURAL	NATURAL, JSON, or JSON_PRETTY
include_spatial	False	Include spatial region breakdown
preamble	True	Include explanatory preamble

Advanced integration

The simplest use is appending Coriro output as text alongside the image in a VLM prompt. But the schema supports integration at any level of your architecture.

Measurements are normalized numerical values (L, C, H in perceptually uniform ranges, weights summing to 1.0) with explicit thresholds. Depending on your pipeline, you can:

Convert palette values into auxiliary feature vectors for multimodal fusion
Map measurements through a projection layer into your model's embedding space
Inject color features as auxiliary tokens alongside vision embeddings
Concatenate structured color vectors with image embeddings before downstream processing
Use measurements as conditioning signals for adapters, LoRA modules, or FiLM-style modulation layers
Feed data into custom embedding or feature store pipelines

Because Coriro exposes explicit perceptual thresholds (ΔE collapse distance, coverage floor), these measurements function as deterministic, reproducible conditioning signals — not heuristic approximations that vary with prompt phrasing.

Coriro does not prescribe how the data is consumed. It provides structured, perceptually grounded color measurements. Whether you integrate them as prompt metadata, auxiliary embeddings, model conditioning, or pipeline features is implementation-specific.

Optional features

Each optional pass is independent and adds its own field to the result without affecting the core palette. Enable any combination via measure() parameters.

Text colors

Extracts foreground text colors via OnnxTR (neural text detection, primary) or Tesseract OCR (fallback). Install with pip install coriro[text].

m = measure("page.png", include_text=True)
m.text_colors            # TextColorMeasurement or None
m.text_colors.colors     # tuple of WeightedColor

Accent regions

Detects solid-color UI regions (buttons, icons, badges) that may be too small for the area-dominant palette but are visually significant. Requires scipy.

m = measure("ui.png", include_accents=True, accent_min_pixels=2000)
m.accent_regions             # AccentMeasurement or None
m.accent_regions.regions     # tuple of AccentRegion (color, pixels, location)

CNN pixel smoothing

Stabilizes color surfaces before extraction — reducing noise from compression artifacts, gradient banding, and anti-aliasing. Requires torch + timm (pip install coriro[cnn]).

m = measure("image.png", smooth=True)

CNN smoothing is off by default because it requires torch + timm. If your environment already includes both, consider enabling smooth=True for improved measurement quality.

Latency control

Coriro internally downsamples to 400px max dimension (NEAREST resampling) for palette, spatial, and chroma passes. Text and accent detection run at full resolution. For additional control, set max_pixels to apply a user-specified cap before processing.

m = measure("image.png", max_pixels=1_500_000)

Hosted API

Coriro is a Python library. If your frontend is built in a different stack (Next.js, React, etc.), you can wrap Coriro in an HTTP endpoint and call it from your application.

Flask

from flask import Flask, request, jsonify
from coriro import measure
import tempfile, os

app = Flask(__name__)

@app.route("/measure", methods=["POST"])
def measure_image():
    file = request.files["image"]
    tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
    file.save(tmp.name)

    m = measure(
        tmp.name,
        include_text=request.args.get("text", "false") == "true",
        include_accents=request.args.get("accents", "false") == "true",
    )
    os.unlink(tmp.name)

    return jsonify(m.to_dict())

FastAPI

from fastapi import FastAPI, UploadFile
from coriro import measure
import tempfile, os

app = FastAPI()

@app.post("/measure")
async def measure_image(
    image: UploadFile,
    text: bool = False,
    accents: bool = False,
):
    tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
    tmp.write(await image.read())
    tmp.close()

    m = measure(
        tmp.name,
        include_text=text,
        include_accents=accents,
    )
    os.unlink(tmp.name)

    return m.to_dict()

Calling from a web frontend

Once the Python server is running, call it from any frontend:

// JavaScript / TypeScript
const form = new FormData();
form.append("image", file);

const res = await fetch("https://your-server/measure?text=true", {
  method: "POST",
  body: form,
});

const measurement = await res.json();
// measurement.dominant, measurement.palette, measurement.text_colors, ...

This pattern works for any architecture: a Next.js frontend calling a Python backend on a separate server, a serverless function on AWS Lambda, or a container on Fly/Railway. Coriro runs wherever Python runs.

Use cases

VLM color grounding

Measured palettes as structured data alongside the image. Vision encoders lose color precision during patching — published benchmarks show significant color-task accuracy gaps across many models. Structured measurement data bypasses this limitation.

Screenshot-to-code

Measured hex values and spatial color distribution for code generation. Models get measured colors from text instead of inferring them from vision-encoded patches.

Product color metadata

Verifiable weighted palettes attached to product listings. Enables perceptual color search across catalogs, cross-SKU consistency checks via ΔE comparison, and early flagging of color mismatches that drive returns.

Design system compliance

Measure rendered screenshots against design token definitions. Perceptual ΔE distance catches real cross-platform drift that hex comparison misses.

Accessibility contrast auditing

Contrast computed from rendered pixels in perceptually uniform OKLCH. Covers text over gradients, background images, and overlapping elements that DOM-based scanners cannot evaluate.

Color-aware asset search

Images indexed by weighted palette and spatial distribution. Retrieval by color proportion, region placement, and perceptual similarity — not flat hex matching.

Design decisions

Why OKLCH, not HSL or RGB?

HSL is not perceptually uniform. Two colors with the same "S" value can look different in perceived saturation. OKLCH was designed for perceptual uniformity — ΔE thresholds mean the same thing across the entire color space. This makes consolidation (collapsing similar colors) reliable: a threshold of 0.03 produces consistent results whether comparing blues, reds, or grays.

Why sample_hex and centroid_hex?

K-means centroids are averages. The centroid of a cluster containing #FF0000 and #FF0200 might be #FF0100 — a color that doesn't exist in the image. For perceptual reasoning (is this warm? saturated? dark?), the centroid is fine. For CSS implementation, you need a real pixel value. sample_hex is the nearest actual pixel to the centroid within the cluster.

Why closed-world metadata?

Without MeasurementMeta, a VLM treats the palette as advisory: "here are some colors, there might be more." The model fills gaps with vision estimates, which defeats the purpose. With the metadata block stating "coverage": "complete" above specific thresholds, the palette becomes authoritative. If green isn't listed, the model knows there's no green above 1% area — it doesn't need to guess.

Why achromatic/chromatic merge guard?

A dark red (L=0.20, C=0.04, H=30) and a near-black (L=0.20, C=0.01) can have a small ΔE. Without the guard, consolidation would merge them — collapsing a dark red accent into a generic black. The achromatic/chromatic boundary prevents this, preserving design intent.

Why mode-based extraction over k-means by default?

Screenshots are synthetic images with exact pixel values. A background color is literally the same RGB triplet across thousands of pixels. Mode-based counting (frequency of exact values) finds these precisely. K-means would average them into a slightly-off centroid. For photos and gradients where exact repetition is rare, k-means (use_mode=False) is more robust.

Why is spatial data off by default in serialization?

For most web screenshots, the global palette carries sufficient signal — the model can see layout from the image itself. Spatial bins add tokens without proportional value in the common case. They're always computed (so the data is there) but serialization is opt-in for when layout-dependent color placement actually matters.