Work — Jody Allard

AI & Prompt Engineering

Generating match explanations with an LLM

Thumbtack

Pros view Thumbtack as a lead generation platform and expect to only pay for high-quality leads. But pros select their job preferences at onboarding and rarely update them — so they often miss out on potential matches. When customers don't respond, pros lose confidence in the platform and risk churning.

Our team's hypothesis: by creating more matches and explaining how we make them, we could increase pro supply and restore confidence in the platform.

↑ Content model design in practice — not writing what the product says, but designing how the system communicates with its users.

Thumbtack leads list showing the match explanation design — 'Why this job?' with a short, targeted explanation surfacing the most important insight for each lead.

The solution: match explanations

To solve this, the team built match explanations — AI-generated summaries that appear on each lead in a pro's feed, telling them exactly why Thumbtack selected them for that specific customer. Rather than receiving a lead with no context, pros see a personalized explanation like "This customer wants help today. They also expressed interest in a recurring job, so they could become a repeat customer for you." — powered by an LLM and grounded in real signals about the customer, the job, and the pro's history. The goal: give pros enough information to make a confident decision about whether to accept or pass, while building trust that Thumbtack's matching was working in their favor.

This project addressed one part of the hypothesis: explaining how Thumbtack makes matches. A separate, parallel project tackled the other side — building a new matching algorithm from the ground up that moved away from an exact matches approach entirely.

How the team approached it

01

Ideation & framing

Explored approaches to match labels and explanations, balancing personalization with engineering effort

02

UX research

"Lead" language resonated most with pros. Personalized labels weren't worth the engineering lift

03

Prompt engineering

Worked with Applied Science across 5 major revisions — format, structure, attributes, voice, and tone

04

LLM evaluation

Developed a weighted evaluation framework with rule-based criteria and critical fail triggers for human review

05

Scaling

Overhauled the style guide for model training; developed a Prompt Engineering Toolkit used company-wide

The prompt

The prompt was the core of the work. The initial version instructed the model to act as a "persuasive yet honest merchant" — generating an explanation of why a pro should take a job, balanced with any reasons they shouldn't. It was given structured data about the job, the customer, and the pro's history, and asked to return an explanation in JSON format.

From V1, a set of core content principles emerged that would shape every subsequent revision:

Surface non-obvious information

Explanations shouldn't just summarize what pros already know — they should highlight insights the pro couldn't easily deduce on their own

Manage expectations honestly

Lead with the fit, but balance it with relevant caveats — err on the side of under-promising and over-delivering

Avoid demographic and value judgments

Never frame an opinion about a customer in the definitive. Focus on non-demographic signals like response rate, repeat booking, and proximity to upcoming jobs

The initial prompt

Prompt iteration in practice

The team went through 5 major revisions, working closely with Applied Science and Engineering. Each round was validated with rolling research. The V1 output was long and overwhelming — a wall of text that buried the key insight and made it hard for pros to quickly scan and act. By V5, the prompt had been refined to produce short, targeted, personalized explanations that surfaced what mattered most to each individual pro.

              V1 Initial output
            

V1 model output: early 'Why this lead' explanation — verbose and overwhelming, not yet tailored to the pro's specific history or the customer's actual context

              V5 Refined output
            

V5 model output: short, targeted, personalized explanation — 'This customer wants help today. They also expressed interest in a recurring job, so they could become a repeat customer for you.'

Building the evaluator

Alongside the prompt work, the team built an evaluation framework to assess output quality at scale. Because manually reviewing every model output wasn't sustainable, the team used an LLM-as-a-judge approach: a second model evaluated the generated explanations against a defined set of quality criteria, flagging outputs that fell short for human review. This allowed the team to run evaluations continuously as the prompt evolved, rather than relying solely on spot checks or user research.

The evaluation rubric drew from content design principles — voice and tone, readability, scannability — adapted into machine-readable criteria the judge model could apply consistently.

What we learned

Our initial proposal was to adapt content guidelines into core criteria for an LLM-as-judge evaluation. The learnings reshaped the approach: too many criteria degrade output quality, and highly context-dependent rules are hard for models to evaluate reliably.

The final framework used a highly focused set of rule-based core criteria, a weighted evaluation strategy, and critical fail criteria to trigger human review — with an adaptable rubric for feature-specific evaluation post-launch.

We also identified a gap: no heuristic existed for the ethical use of AI in our content. We built one.

Weighted heuristic evaluation framework with criteria for voice and tone, readability, accessibility, scannability, and style adherence

Making it scale beyond one feature

The work on match explanations became the foundation for how Thumbtack approaches generative AI content more broadly.

Prompt Engineering Toolkit

Used by all teams working on generative AI content at the company

LLM-as-judge framework

Standardized evaluation approach for AI-generated product content

AI Content Working Group

Cross-functional alignment on AI content principles and guidelines

Style guide overhaul

Rebuilt from the ground up to be robust enough for model training

Content Strategy

Getting creators to claim their money — before it disappeared

The Creator Rewards program let a select group of Pinterest creators earn money by completing monthly challenges. To get paid, creators had to complete payments setup through a third-party processor. Many didn't — or couldn't. As the program scaled, unclaimed funds became a growing financial and legal risk. Pinterest made the decision to expire funds unclaimed after 60 days.

The challenge: get creators to act on something they'd been ignoring — without creating panic, eroding trust, or generating negative press.

Four surfaces. One escalating strategy.

✉️

Lead channel. Established context, explained the deadline, and guided setup with clear steps.

🔔

Push notification

Urgency amplifier as the deadline approached. Short, specific, action-oriented.

🗂️

Modal

Interrupted the session at the right moment. Named the amount at risk to make it concrete.

📣

Banner

Persistent reminder visible throughout the product as expiration neared.

Tone escalation over 60 days

Day 1 — Informational Day 30 — Guidance Day 50 — Urgent Day 60

The full content strategy — four surfaces, escalating over 60 days

The strategic details that made it work

I worked closely with Legal and Creator Operations to ensure every message was accurate, compliant, and timed correctly. Tone moved from informational to increasingly urgent — but never alarmist.

I also negotiated with the support team to route creators directly to a support ticket form when funds were about to expire. This created a clear, actionable path for creators who genuinely couldn't resolve the issue themselves — and reduced creator frustration.

Every message named the specific dollar amount at risk — but how, where, and at what weight shifted deliberately as the deadline approached. Early messages led with information and opportunity. Later ones made the loss feel immediate and concrete, without tipping into alarm.

Impact

~100%

of eligible creators claimed their funds

negative press coverage generated

The strategy was reused for tax form collection

Applied the same escalating content model to get creators to enter their full SSNs for tax forms. 85% of creators with missing SSNs added them within the first 2 days.

Tool-building

Building an AI writing assistant for partners — inside Figma

Thumbtack

Content designers were writing content directly for partners — a model that didn't scale and pulled focus from higher-impact model design work. Partners needed a way to write content themselves, guided by Thumbtack's guidelines, so they could come to office hours with drafts ready for feedback rather than starting from scratch.

What if partners could write content themselves in Figma — the tool where they're already working — and bring polished drafts to office hours, freeing content designers to focus on the model design work that only they can do?

How it works

🎨 Figma plugin

Select any layer → pull text into chat

↓

☁️ Cloudflare Worker

Secure API proxy — key never exposed in plugin code

↓

🤖 GPT-4o

Guided by full Thumbtack content guidelines as system prompt

What it knows

Voice & tone Component constraints Email rules Forbidden patterns Preferred terms Brand messaging GTM email examples

The process I used to build it

I built both tools myself — the writing GPT and the Figma plugin — in just a few days from start to finish. The goal was to give partners writing help inside the tool where they were already working, rather than asking them to context-switch to a separate product.

I started with a Custom GPT on chatgpt.com: no infrastructure, just prompt iteration until the model reliably applied Thumbtack's guidelines, asked the right clarifying questions before writing emails, and always returned full rewrites. Then I used Claude Code to build the Figma plugin, describing the behavior I wanted in plain language rather than writing code myself. The whole thing runs on Cloudflare Workers to keep the API key secure.

This project is also how I learned that the skills content designers already have — identifying problems, defining requirements, working with prompts, defining model behavior, and iterating on output — are exactly the skills that make someone good at building AI tools.

Features

🔗

Pull text from any Figma layer

Recursively extracts all text nodes from selected frames, components, or layers

💬

Persistent chat with full conversation history

Iterate on rewrites and ask follow-ups in a single session

⚡

One-click suggestion chips

Review copy, rewrite for Thumbtack voice, check a button label, write an error message

How I built it

Diversity, equity & inclusion

Building the infrastructure
for more inclusive products.

Across Meta and Pinterest, I've built programs that didn't exist before — not just guidelines on a page, but structures that give underrepresented voices ongoing influence on the products that affect them.

Meta

Disability Review Board

There was no forum for teams to get feedback from people with disabilities about the products and communications affecting them. Teams relied on a handful of individuals who were open about their disabilities — and many disability types had no designated reviewer at all.

I founded the Disability Review Board: a group of people with disabilities who provide input based on lived experience — from wheelchair representations in avatars to training for advertisers.

14 reviews in the first 3 months

100% rated the feedback very or extremely helpful

88% said it had very or extremely high impact on their final work

Pinterest

Inclusive Terminology Guide

There was no comprehensive guide to inclusive terminology at the company. I recruited contributors from across Pinterest, with each section led by someone from the relevant ERG to center lived experience. Coordinated reviews with PR, Legal, DEI, Learning & Development, and executives — launched in three months.

Work shared selectively.

Case
Studies

Generating match explanations with an LLM

Getting creators to claim their money — before it disappeared

Building an AI writing assistant for partners — inside Figma

Building the infrastructure
for more inclusive products.

Disability Review Board

Inclusive Terminology Guide

Work shared selectively.

CaseStudies

Generating match explanations with an LLM

Getting creators to claim their money — before it disappeared

Building an AI writing assistant for partners — inside Figma

Building the infrastructurefor more inclusive products.

Disability Review Board

Inclusive Terminology Guide

Case
Studies

Building the infrastructure
for more inclusive products.