How Can AI Agents Learn to Design Better? — A Research Blog

TLDR: AI agents can write code and browse the web, but design sense remains an open problem. This blog tests the hypothesis that design can be encoded as agent-perceivable signals — CSS variables, semantic tokens, computed contrast ratios — and that agents can learn to evaluate and improve design through structured feedback.

The Question

AI agents are getting good at language tasks. They write code, summarize documents, browse websites, and execute multi-step workflows. But there’s one domain where they consistently struggle: visual design.

An agent can tell you the hex value of a button’s background color. It can list every CSS variable on a page. It can compute contrast ratios. But can it tell you whether that button looks good? Can it critique a layout? Can it learn to make better design decisions over time?

This blog exists to answer that question.

Why It Matters

Design is not just an aesthetic concern. Design decisions affect:

Usability — Good contrast ratios improve readability by 40%
Trust — Consistent visual systems increase user confidence
Conversion — Layout decisions directly impact click-through and engagement
Accessibility — Design choices determine whether a site is usable by people with disabilities

If AI agents can learn to evaluate and improve design, they can:

Audit existing sites for design issues automatically
Generate layouts that follow proven design principles
Maintain visual consistency across growing design systems
Make accessibility recommendations as part of routine code review

The Approach

We’re approaching this question through four sub-investigations:

1. Perception — What can agents actually see? Agents don’t have eyes. They parse the DOM, read CSS, and access computed styles. The first question is: what design signals are available in this data? Which ones are reliable? We’re building a catalog of agent-perceivable design signals — CSS variable coverage, contrast ratios, spacing consistency, semantic naming coverage.

2. Criteria — What rules can agents apply? Design heuristics that can be encoded as computable rules: WCAG contrast thresholds, consistent spacing intervals, typographic hierarchy depth, color palette harmony measured through delta-E distance. We’re testing which design heuristics map well to algorithmic evaluation.

3. Feedback — How do agents learn from results? A/B test results, user behavior data, design critique annotations — we’re building feedback loops where agents can see the outcome of design decisions and update their evaluation models accordingly.

4. Tools — What infrastructure do agents need? Design token formats, agent-readable specs, linters, and comparison tools — the infrastructure that bridges design and AI.

What We’ve Found So Far

CSS variables are the best agent-readable design signal. An agent can enumerate all --color-*, --space-*, and --font-* variables on a page and immediately understand the design vocabulary. Hardcoded values are invisible to automated analysis.

Semantic naming enables evaluation. A --color-error tells an agent what a color means, not just what it looks like. This is the foundation for agent-driven design critique — understanding intent, not just values.

Design token formats differ in agent-readability. We’ve documented the trade-offs between flat YAML and nested JSON and are building exporter pipelines between formats.

Design heuristics can be encoded as linter rules. At least 60-70% of WCAG guidelines, spacing consistency rules, and typography best practices can be expressed as automated checks. The remaining 30-40% requires human judgment — and that’s where the interesting research question lies.

What’s Next

This blog will publish regular experiments: design reviews scored by agent-perceivable criteria, A/B tests comparing agent-readable vs non-agent-readable designs, and comparisons of different design approaches through the lens of what an agent can learn from each.

Every post is tagged with the sub-question it addresses (perception, criteria, feedback, tools) so you can follow the investigation that interests you most. And the Design Comparator lets you run your own experiments — name two concepts, rate them across dimensions, and see how they compare.

If you’re building AI agents, designing design systems, or just curious about whether machines can develop a sense of aesthetics — welcome. This is a live research blog, and the answers are still emerging.