Documentation
v2.0.0
Install
pip install coriro
This installs the core package with NumPy and Pillow. Optional features require extra dependencies. If CNN dependencies are missing, smooth=True raises an install hint. If text or accent dependencies are unavailable, those passes return None.
To install everything upfront:
pip install coriro[all]
onnxtr[cpu] (neural text detection) by default and falls back to pytesseract when needed. The fallback requires the tesseract-ocr system binary (brew install tesseract / apt install tesseract-ocr).
Requires Python 3.10+. No GPU needed.
Quick start
from coriro import measure m = measure("image.png") # Get results in the format your pipeline needs m.to_json() # Raw measurement JSON (schema) m.to_xml() # XML for context blocks m.to_prompt() # Human-readable natural language
With all features enabled:
from coriro import measure, GridSize m = measure( "image.png", grid=GridSize.GRID_3X3, # 9-region spatial grid smooth=True, # CNN pixel stabilization (requires torch + timm) include_text=True, # extract text foreground colors include_accents=True, # detect solid accent regions (buttons, icons) max_output_colors=8, # larger palette ) # Consolidated JSON with spatial data from coriro.runtime import to_tool_output json_str = to_tool_output(m, consolidated=True, include_spatial=True)
API
measure()
The primary entry point. Accepts a file path, a pathlib.Path, or a NumPy uint8 RGB array. Returns a ColorMeasurement.
def measure( image: str | Path | NDArray[np.uint8], *, palette_size: int = 10, grid: GridSize = GridSize.GRID_2X2, colors_per_region: int = 3, include_hash: bool = True, max_pixels: int = 0, use_mode: bool = True, consolidate: bool = True, delta_e_threshold: float = 0.03, max_output_colors: int = 5, include_text: bool = False, include_accents: bool = False, accent_min_pixels: int = 2000, smooth: bool = False, ) -> ColorMeasurement
| Parameter | Type | Default | Description |
|---|---|---|---|
| image | str | Path | ndarray | - | Image file path or uint8 RGB array (H, W, 3) |
| max_output_colors | int | 5 | Final palette size after consolidation |
| grid | GridSize | GRID_2X2 | Spatial grid: GRID_2X2 (4), GRID_3X3 (9), or GRID_4X4 (16 regions) |
| max_pixels | int | 0 | Additional downsample cap. 0 = no extra limit. Coriro internally downsamples to 400px max dimension for palette, spatial, and chroma passes |
| include_text | bool | False | Text foreground color extraction via OnnxTR (primary) or Tesseract (fallback) |
| include_accents | bool | False | Solid region detection for UI accent colors. Requires scipy |
| smooth | bool | False | CNN pixel smoothing before extraction. Requires torch + timm |
Advanced parameters (defaults are tuned for most use cases):
| Parameter | Type | Default | Description |
|---|---|---|---|
| palette_size | int | 10 | Colors to extract before consolidation |
| colors_per_region | int | 3 | Colors per spatial region |
| use_mode | bool | True | Mode-based extraction (exact pixel counts). Set False for k-means (better for photos/gradients) |
| consolidate | bool | True | Merge near-identical colors via ΔE distance in OKLab |
| delta_e_threshold | float | 0.03 | ΔE threshold for collapsing similar colors |
| accent_min_pixels | int | 2000 | Minimum pixel count for accent regions |
| include_hash | bool | True | Include truncated SHA-256 prefix of the source pixels (16 hex chars) |
Convenience methods on ColorMeasurement
| Method | Returns | Description |
|---|---|---|
| .to_json(indent=2) | str | JSON string. Set indent=None for compact output |
| .to_xml(include_spatial=True) | str | XML context block |
| .to_prompt(include_spatial=True) | str | Human-readable natural language |
| .to_dict() | dict | Python dict of the full measurement |
| .from_json(json_str) | ColorMeasurement | Deserialize from JSON string (classmethod) |
| .from_dict(data) | ColorMeasurement | Deserialize from dict (classmethod) |
.to_xml() and .to_prompt() include spatial data by default. The underlying serializer functions (to_context_block, to_system_prompt) default to include_spatial=False. Use the serializer functions directly when you need fine-grained control.
Types & color space
All types are frozen dataclasses (immutable). Primary types are importable from coriro directly. Additional types are available from coriro.schema.
from coriro import ( ColorMeasurement, # Top-level result OKLCHColor, # Color in OKLCH space WeightedColor, # Color + relative weight GridSize, # Spatial grid enum SpatialBins, # Spatial distribution RegionColor, # Per-region color data ) # Additional types from coriro.schema from coriro.schema import ( MeasurementMeta, # Measurement scope and thresholds TextColorMeasurement, # Text color result AccentMeasurement, # Accent region result AccentRegion, # Single accent region )
OKLCHColor
OKLCH is a cylindrical form of OKLab designed for perceptual uniformity — ΔE thresholds mean the same thing across the entire color space. This is the canonical color representation throughout Coriro.
| Field | Type | Description |
|---|---|---|
| L | float | Lightness (0.0 = black, 1.0 = white) |
| C | float | Chroma (0.0 = gray, ~0.32 = max saturation in sRGB gamut) |
| H | float | None | Hue in degrees (0–360). None for achromatic colors (C < 0.02) |
| sample_hex | str | None | Hex of a real pixel from the cluster. Use this for implementation — it exists in the image |
Properties: .hex returns sample_hex if available, otherwise computes from L/C/H. .centroid_hex always computes from the OKLCH centroid (cluster average — may not correspond to any real pixel). .is_achromatic is true when chroma is below 0.02.
WeightedColor
| Field | Type | Description |
|---|---|---|
| color | OKLCHColor | The color value |
| weight | float | Relative dominance (0.0-1.0). Weights in a palette sum to 1.0 |
GridSize
| Value | Regions |
|---|---|
| GRID_2X2 | 4 regions (default) |
| GRID_3X3 | 9 regions |
| GRID_4X4 | 16 regions |
Regions are identified in reading order (left-to-right, top-to-bottom): R1C1, R1C2, R2C1, R2C2 for a 2x2 grid.
ColorMeasurement
The top-level result container returned by measure().
| Field | Type | Description |
|---|---|---|
| dominant | OKLCHColor | The single most prominent color in the image |
| palette | tuple[WeightedColor, ...] | Ranked colors, most to least dominant. Weights sum to 1.0 |
| spatial | SpatialBins | Grid-based spatial color distribution (always computed) |
| measurement | MeasurementMeta | None | Closed-world metadata — scope, thresholds, completeness |
| text_colors | TextColorMeasurement | None | Text foreground colors (when include_text=True) |
| accent_regions | AccentMeasurement | None | Solid accent regions (when include_accents=True) |
| image_hash | str | None | Truncated SHA-256 prefix of the source pixels (16 hex chars) |
TextColorMeasurement
Present when include_text=True. Access text foreground colors via m.text_colors.colors — a tuple[WeightedColor, ...]. Remaining fields (scope, coverage, ocr_engine) are serialization metadata handled automatically by the serializers.
AccentMeasurement
Present when include_accents=True. Access detected regions via m.accent_regions.regions — a tuple[AccentRegion, ...]. Each AccentRegion has .color (OKLCHColor), .pixels (int), and .location (grid region ID or None). Remaining fields are serialization metadata.
MeasurementMeta
Serialized automatically by all output formats. Tells the receiving model the measurement scope, coverage thresholds, and palette cap — so it knows the palette is complete above those thresholds.
Serialization
Three serialization targets for different pipeline architectures. All formats produce structured text that can be passed alongside images in any multimodal pipeline — as prompt context, auxiliary input, or structured features. Compatible with Claude, GPT, Gemini, Llama, Qwen-VL, and custom architectures. How you integrate the output is your domain.
Each serializer is available as a standalone function for fine-grained control.
Tool output (JSON)
For VLMs with tool/function calling support. Recommended for production.
from coriro.runtime import to_tool_output # Consolidated — hex + OKLCH + weight per color json_str = to_tool_output(m, consolidated=True) # Compact — token-constrained pipelines json_str = to_tool_output(m, compact=True) # Hex-only — minimal output, implementation-ready json_str = to_tool_output(m, hex_only=True) # With spatial data json_str = to_tool_output(m, consolidated=True, include_spatial=True)
| Parameter | Default | Description |
|---|---|---|
| format | JSON | JSON or JSON_PRETTY |
| include_spatial | False | Include spatial region data |
| compact_colors | False | Compact color notation (L0.85/C0.15/H220) |
| compact | False | Minimal output. Implies compact_colors |
| include_hex | False | Include hex values alongside OKLCH |
| hex_only | False | Hex values only. Implies compact |
| image_id | None | Image identifier for multi-image contexts |
| consolidated | False | Design-friendly format: hex + OKLCH + weight per color |
Example output from a real website screenshot:
Dominant
Palette
Text Colors
Accent Regions
The same measurement in all three serialization formats:
{
"tool": "coriro_color_measurement",
"measurement": {
"version": "1.0",
"scope": "area_dominant_surfaces",
"coverage": "complete",
"thresholds": { "min_area_pct": 1.0, "delta_e_collapse": 0.03 },
"palette_cap": 5,
"spatial_role": "diagnostic",
"perceptual_supplements": 5
},
"dominant": { "hex": "#0D2330", "oklch": "L0.25/C0.04/H237" },
"palette": [
{ "hex": "#0D2330", "oklch": "L0.25/C0.04/H237", "weight": 0.6 },
{ "hex": "#0A1B24", "oklch": "L0.21/C0.03/H234", "weight": 0.36 },
{ "hex": "#E8654B", "oklch": "L0.66/C0.17/H33", "weight": 0.03 },
{ "hex": "#6E8F63", "oklch": "L0.61/C0.07/H138", "weight": 0.0 },
{ "hex": "#A88BC0", "oklch": "L0.68/C0.08/H309", "weight": 0.0 }
],
"spatial": {
"R1C1": { "hex": "#0A1B24", "coverage": 0.50 },
"R1C2": { "hex": "#0F242D", "coverage": 0.42 },
"R2C1": { "hex": "#0D2330", "coverage": 0.64 },
"R2C2": { "hex": "#0D2330", "coverage": 0.81 }
},
"text_colors": {
"scope": "glyph_foreground",
"coverage": "complete",
"thresholds": { "min_area_pct": 0.5 },
"ocr_engine": "onnxtr",
"colors": [
{ "hex": "#EBE6E2", "oklch": "L0.93/C0.01", "weight": 0.51 },
{ "hex": "#8BAF7C", "oklch": "L0.71/C0.08/H137", "weight": 0.23 },
{ "hex": "#D4A843", "oklch": "L0.75/C0.13/H85", "weight": 0.09 },
{ "hex": "#556B73", "oklch": "L0.51/C0.03/H222", "weight": 0.06 },
{ "hex": "#A88BC0", "oklch": "L0.68/C0.08/H309", "weight": 0.06 },
{ "hex": "#E8654B", "oklch": "L0.66/C0.17/H33", "weight": 0.05 }
]
},
"accent_regions": {
"scope": "solid_accent_regions",
"coverage": "complete",
"thresholds": { "min_pixels": 2000 },
"regions": [
{ "hex": "#253942", "oklch": "L0.33/C0.03/H228", "pixels": 57507, "location": "R1C2" },
{ "hex": "#52514B", "oklch": "L0.43/C0.01", "pixels": 19864, "location": "R1C2" },
{ "hex": "#142125", "oklch": "L0.24/C0.02", "pixels": 14175, "location": "R2C2" },
{ "hex": "#8BAF7C", "oklch": "L0.71/C0.08/H137", "pixels": 7566, "location": "R1C2" },
{ "hex": "#EBE6E2", "oklch": "L0.93/C0.01", "pixels": 2210, "location": "R1C1" }
]
}
}Context block (XML / JSON / Markdown)
For inline injection alongside images. XML works well with Claude and Qwen.
from coriro.runtime import to_context_block, BlockFormat xml_str = to_context_block(m, format=BlockFormat.XML) json_str = to_context_block(m, format=BlockFormat.JSON) md_str = to_context_block(m, format=BlockFormat.MARKDOWN)
| Parameter | Default | Description |
|---|---|---|
| format | XML | XML, JSON, or MARKDOWN |
| include_spatial | False | Include spatial region data |
| tag_name | "color_measurement" | Root tag/heading name |
System prompt (natural language)
Human-readable text for system prompt injection.
from coriro.runtime import to_system_prompt, SerializerFormat text = to_system_prompt(m, format=SerializerFormat.NATURAL) text = to_system_prompt(m, format=SerializerFormat.JSON)
| Parameter | Default | Description |
|---|---|---|
| format | NATURAL | NATURAL, JSON, or JSON_PRETTY |
| include_spatial | False | Include spatial region breakdown |
| preamble | True | Include explanatory preamble |
Advanced integration
The simplest use is appending Coriro output as text alongside the image in a VLM prompt. But the schema supports integration at any level of your architecture.
Measurements are normalized numerical values (L, C, H in perceptually uniform ranges, weights summing to 1.0) with explicit thresholds. Depending on your pipeline, you can:
- Convert palette values into auxiliary feature vectors for multimodal fusion
- Map measurements through a projection layer into your model's embedding space
- Inject color features as auxiliary tokens alongside vision embeddings
- Concatenate structured color vectors with image embeddings before downstream processing
- Use measurements as conditioning signals for adapters, LoRA modules, or FiLM-style modulation layers
- Feed data into custom embedding or feature store pipelines
Because Coriro exposes explicit perceptual thresholds (ΔE collapse distance, coverage floor), these measurements function as deterministic, reproducible conditioning signals — not heuristic approximations that vary with prompt phrasing.
Coriro does not prescribe how the data is consumed. It provides structured, perceptually grounded color measurements. Whether you integrate them as prompt metadata, auxiliary embeddings, model conditioning, or pipeline features is implementation-specific.
Optional features
Each optional pass is independent and adds its own field to the result without affecting the core palette. Enable any combination via measure() parameters.
Text colors
Extracts foreground text colors via OnnxTR (neural text detection, primary) or Tesseract OCR (fallback). Install with pip install coriro[text].
m = measure("page.png", include_text=True) m.text_colors # TextColorMeasurement or None m.text_colors.colors # tuple of WeightedColor
Accent regions
Detects solid-color UI regions (buttons, icons, badges) that may be too small for the area-dominant palette but are visually significant. Requires scipy.
m = measure("ui.png", include_accents=True, accent_min_pixels=2000) m.accent_regions # AccentMeasurement or None m.accent_regions.regions # tuple of AccentRegion (color, pixels, location)
CNN pixel smoothing
Stabilizes color surfaces before extraction — reducing noise from compression artifacts, gradient banding, and anti-aliasing. Requires torch + timm (pip install coriro[cnn]).
m = measure("image.png", smooth=True)
torch + timm. If your environment already includes both, consider enabling smooth=True for improved measurement quality.
Latency control
Coriro internally downsamples to 400px max dimension (NEAREST resampling) for palette, spatial, and chroma passes. Text and accent detection run at full resolution. For additional control, set max_pixels to apply a user-specified cap before processing.
m = measure("image.png", max_pixels=1_500_000)
Hosted API
Coriro is a Python library. If your frontend is built in a different stack (Next.js, React, etc.), you can wrap Coriro in an HTTP endpoint and call it from your application.
Flask
from flask import Flask, request, jsonify from coriro import measure import tempfile, os app = Flask(__name__) @app.route("/measure", methods=["POST"]) def measure_image(): file = request.files["image"] tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".png") file.save(tmp.name) m = measure( tmp.name, include_text=request.args.get("text", "false") == "true", include_accents=request.args.get("accents", "false") == "true", ) os.unlink(tmp.name) return jsonify(m.to_dict())
FastAPI
from fastapi import FastAPI, UploadFile from coriro import measure import tempfile, os app = FastAPI() @app.post("/measure") async def measure_image( image: UploadFile, text: bool = False, accents: bool = False, ): tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".png") tmp.write(await image.read()) tmp.close() m = measure( tmp.name, include_text=text, include_accents=accents, ) os.unlink(tmp.name) return m.to_dict()
Calling from a web frontend
Once the Python server is running, call it from any frontend:
// JavaScript / TypeScript const form = new FormData(); form.append("image", file); const res = await fetch("https://your-server/measure?text=true", { method: "POST", body: form, }); const measurement = await res.json(); // measurement.dominant, measurement.palette, measurement.text_colors, ...
This pattern works for any architecture: a Next.js frontend calling a Python backend on a separate server, a serverless function on AWS Lambda, or a container on Fly/Railway. Coriro runs wherever Python runs.
Use cases
VLM color grounding
Measured palettes as structured data alongside the image. Vision encoders lose color precision during patching — published benchmarks show significant color-task accuracy gaps across many models. Structured measurement data bypasses this limitation.
Screenshot-to-code
Measured hex values and spatial color distribution for code generation. Models get measured colors from text instead of inferring them from vision-encoded patches.
Product color metadata
Verifiable weighted palettes attached to product listings. Enables perceptual color search across catalogs, cross-SKU consistency checks via ΔE comparison, and early flagging of color mismatches that drive returns.
Design system compliance
Measure rendered screenshots against design token definitions. Perceptual ΔE distance catches real cross-platform drift that hex comparison misses.
Accessibility contrast auditing
Contrast computed from rendered pixels in perceptually uniform OKLCH. Covers text over gradients, background images, and overlapping elements that DOM-based scanners cannot evaluate.
Color-aware asset search
Images indexed by weighted palette and spatial distribution. Retrieval by color proportion, region placement, and perceptual similarity — not flat hex matching.
Design decisions
Why OKLCH, not HSL or RGB?
HSL is not perceptually uniform. Two colors with the same "S" value can look different in perceived saturation. OKLCH was designed for perceptual uniformity — ΔE thresholds mean the same thing across the entire color space. This makes consolidation (collapsing similar colors) reliable: a threshold of 0.03 produces consistent results whether comparing blues, reds, or grays.
Why sample_hex and centroid_hex?
K-means centroids are averages. The centroid of a cluster containing #FF0000 and #FF0200 might be #FF0100 — a color that doesn't exist in the image. For perceptual reasoning (is this warm? saturated? dark?), the centroid is fine. For CSS implementation, you need a real pixel value. sample_hex is the nearest actual pixel to the centroid within the cluster.
Why closed-world metadata?
Without MeasurementMeta, a VLM treats the palette as advisory: "here are some colors, there might be more." The model fills gaps with vision estimates, which defeats the purpose. With the metadata block stating "coverage": "complete" above specific thresholds, the palette becomes authoritative. If green isn't listed, the model knows there's no green above 1% area — it doesn't need to guess.
Why achromatic/chromatic merge guard?
A dark red (L=0.20, C=0.04, H=30) and a near-black (L=0.20, C=0.01) can have a small ΔE. Without the guard, consolidation would merge them — collapsing a dark red accent into a generic black. The achromatic/chromatic boundary prevents this, preserving design intent.
Why mode-based extraction over k-means by default?
Screenshots are synthetic images with exact pixel values. A background color is literally the same RGB triplet across thousands of pixels. Mode-based counting (frequency of exact values) finds these precisely. K-means would average them into a slightly-off centroid. For photos and gradients where exact repetition is rare, k-means (use_mode=False) is more robust.
Why is spatial data off by default in serialization?
For most web screenshots, the global palette carries sufficient signal — the model can see layout from the image itself. Spatial bins add tokens without proportional value in the common case. They're always computed (so the data is there) but serialization is opt-in for when layout-dependent color placement actually matters.