TL;DR
- A sales experiment that ships has 6 parts: hypothesis, single variable, control, sample size, kill rule, written learning. Skip any one and it becomes a directive, not a test.
- Sample floors: 200 prospects per variant for reply-rate tests, 20 calls for call-stage tests, 10 deals for win-rate tests. Below those, you are reading variance.
- Run 3 experiments per quarter. Queue the other 17 in a shared document. The queue is the compounding machine.
- Most outreach experiments read in 10 business days. Call experiments in a week of dialing. Cadence and process experiments across a full quarter.
- 6 failure modes kill most sales experiments: no kill rule, two variables changed, sample too small, no control, no writeup, wrong funnel stage. All six trace back to treating experiments as inspiration instead of a structured bet.
Snippet answer
Sales experiments to run are structured tests — one variable, a control, a sample floor, a kill rule, and a written result — across outreach, call, cadence, and process workflows. This post covers 20 specific tests in four funnel buckets, a prioritization matrix to pick the right three for the quarter, the 7-day playbook that runs each one, and the 6 failure modes that kill most tests before they ship.
Why most sales teams stop experimenting after week 3
A VP reads a Gong report. Decides the team is going to test a new cold email opener. Monday morning: kickoff. Week one: three reps try it. Week two: two reps try it. Week three: one rep — the one who was never going to hit quota anyway. The "experiment" is dead, nobody called it, and nobody knows whether the new opener would have worked.
The test never had a chance. It had no hypothesis, no kill rule, no sample size, and no forum to read the result. It was a directive disguised as an experiment. Most sales experiments die this way — the team agrees to "try something different," leaves unclear what counts as different and what counts as success, and by week three the urgency of the quarter swallows the discipline of the test.
The sales experiments that actually change a quarter have a shape. A single variable. A sample big enough to read. A kill rule that fires when the variable underperforms. A forum — usually the weekly pipeline call — where the result gets read out loud in 90 seconds.
This post is that shape, applied to 20 experiments. Pick three. Run them for 10 business days. Read the result in the pipeline review. Kill the losers, scale the winner, queue the next three. The point is not testing for its own sake — it is how a team learns its own market faster than the competition learns it.
The anatomy of a sales experiment that ships
A sales experiment that ships has six parts. Skip any one and the thing turns into a directive nobody tracks. The rest of the post — and every one of the 20 experiments — is written against this shape.
- 1
The hypothesis
One sentence, written in "If X, then Y" form. "If we lead with the signal in the opener, reply rate climbs from 6% to 10%." Not "we should try signal-led openers" — that is a suggestion, not an experiment.
- 2
The single variable
One thing changes. Not the opener AND the CTA AND the send time. If three things change and replies move, you do not know which one moved them. Discipline here is what separates learning from noise.
- 3
The control
The old version runs beside the new version on the same kind of prospect. If the control runs on Tier-2 accounts and the variant runs on Tier-1, the result is a firmographic difference, not a copy difference. Split the same segment and send both the same week.
- 4
The sample size
Small enough to run in 10 business days; big enough to read. A sensible floor: 200 prospects per variant for reply-rate tests, 20 calls per variant for call-stage tests, 10 deals per variant for win-rate tests. Below that, you are reading variance.
- 5
The kill rule
The condition that makes you stop the experiment before the sample fills. "If the variant replies under 2% after the first 75 prospects, kill it." A kill rule protects the pipeline from a bad variant that runs unchecked for three weeks.
- 6
The learning
Written in one paragraph, shared in the pipeline review. Not a slide deck — a paragraph. "We tested signal-led openers against pattern-led on 240 prospects split evenly. Signal-led replied at 9.2%, pattern-led at 5.8%. Rolling signal-led into the Growth segment; retiring pattern-led."
If the shape sounds obvious, good — running it every time is the thing teams skip. The gap between "we tested this" and "we know this" is the shape. Reps write down what they learned, managers compound the learnings into the playbook, the next new rep inherits a sharper starting point than the last.
The experiments in the rest of this post are all written to fit this shape. Each has a hypothesis, a variable, a sample floor, and a kill rule. Pick the ones that match the quarter; ignore the rest for now and drop them in the queue document for next quarter.
5 outreach experiments to run this quarter
Outreach experiments move the top of the funnel fast. Reply-rate tests read in 10 business days with 200 prospects per variant — the smallest, cheapest way to generate learning. Every experiment below has the same shape: one variable, same segment split evenly, 10-day window, written kill rule. Run any two of the five this quarter.
Subject line format
Variant: Three variants: question ("Why is Acme still on Apollo?"), stat ("35% of your pipeline is Tier-2"), one-word lowercase ("timing").
Hypothesis: One-word beats stat beats question on open rate for our ICP.
Send time
Variant: Same email body, three send times local to the prospect: 6:15am, 10:30am, 4:45pm.
Hypothesis: 6:15am beats 10:30am and 4:45pm because the email is the first one the buyer sees that day.
Opener format
Variant: Signal-led ("Saw Acme posted a VP Sales role Thursday — curious if outbound is on the list for Q2") vs pattern-interrupt ("This is a cold email — skip if that is not your thing").
Hypothesis: Signal-led outperforms pattern-interrupt by 1.5× on reply rate.
Body length
Variant: 3-sentence body (one pain, one proof, one ask) vs 5-sentence body (signal, pain, proof, customer example, ask). Measure reply rate AND meeting-show rate.
Hypothesis: 3-sentence wins on reply rate; 5-sentence wins on meeting quality (fewer no-shows).
Channel sequence
Variant: Email-first → LinkedIn connect day 3 → LinkedIn DM day 5 vs LinkedIn connect day 1 → LinkedIn DM day 2 → email day 4.
Hypothesis: LinkedIn-first wins on reply rate for Series-A-and-up prospects because the warmth of a connection carries the email.
Across all five, the shape is the same — one variable, 10 days, a kill rule. The winner gets rolled into the default sequence; the losers get retired. The learning goes in the sales playbook so the next rep inherits the answer, not the question.
5 call performance experiments worth running
Call experiments need fewer samples but more patience — 20 calls per variant takes a full week for most AEs. The payoff is directly visible in stage progression. These are the tests that move deals, not just replies. Assign one to each senior rep; have juniors shadow to speed up the sample.
Opening format
Variant: Agenda-first ("Three things I would love to cover — your environment, your priorities, what a next step would look like") vs story-first ("Last month we worked with a company that looked a lot like yours...").
Hypothesis: Agenda-first wins on demo→evaluation conversion because the buyer trusts the rep is organized.
Discovery depth
Variant: One-pain deep-dive ("What is the single most urgent sales problem you have right now?" + 15 min of follow-up) vs 5-question frame ("Let me ask five things...").
Hypothesis: One-pain wins on deal quality; 5-question wins on breadth.
Demo sequencing
Variant: Pain-led demo (three features tied to the pain the buyer named) vs tour demo (five most-used features in order).
Hypothesis: Pain-led doubles the probability of a next-step close on the call.
Close phrasing
Variant: "Does this make sense as a next step?" vs "What would change for your team if you had this in 30 days?"
Hypothesis: The open-ended close surfaces the champion internal narrative and produces more specific next steps.
ROI doc timing
Variant: Send the ROI one-pager 24 hours before the pricing call vs within 30 minutes after.
Hypothesis: Sending before makes the pricing call shorter and surfaces the objection earlier.
Call experiments feel slower than outreach ones, but they move the deal, not just the reply. Two reps each running one call experiment for 10 business days generates 40 data points — enough to bet the playbook on. Write the result paragraph the same Friday the sample closes.
5 pipeline and cadence experiments that move win rate
Cadence experiments read over two to three weeks because the variable is the shape of the sequence, not a single touch. Win-rate tests take a full quarter. Run both. The payoff on cadence experiments compounds longer than the sprint horizon — a 2-point reply-rate lift on a 300-prospect-a-month cadence produces 72 extra conversations across a year.
Cadence length
Variant: 8 touches over 14 days vs 12 touches over 14 days.
Hypothesis: 12-touch wins on reply rate but loses on opt-out rate — read both.
Cadence duration
Variant: 8 touches over 14 days vs 8 touches over 21 days.
Hypothesis: 21-day wins on reply rate because the buyer has time to return to a dormant thread.
Touch gap
Variant: Day-1 touch → day-2 follow-up vs day-1 touch → day-4 follow-up.
Hypothesis: Day-2 follow-up wins on reply rate because the buyer remembers the first message.
Channel mix
Variant: Email-heavy (6 email, 2 LinkedIn) vs hybrid (4/4) vs LinkedIn-heavy (2/6).
Hypothesis: Hybrid wins on reply rate for senior buyers (VP+).
Breakup email ask
Variant: Specific-ask ("if timing changes, reply ‘pilot’ and we will re-engage") vs vague breakup email with no ask.
Hypothesis: A one-word-reply ask triples the breakup-reply rate.
Cadence tests compound. A 2-percentage-point lift on a 300-prospect-a-month cadence produces 6 extra conversations a month. Over four quarters that is 72 — and 72 conversations at a 20% meeting rate is 14 net-new meetings from a single, well-designed test. Most teams never run it because "cadence" sounds too structural to A/B test.
5 CRM and process experiments most teams skip
Process experiments do not move reply rate. They move rep hours. Rep hours move the number of calls a rep can run; that moves the number of deals; that moves the quarter. Skip these and you are optimizing the car while leaving it in second gear. The four below read across a full quarter — start them alongside the faster outreach tests.
Time-blocked prospecting
Variant: Every rep blocks 9–11am Tue/Thu for prospecting (no meetings, no CRM) vs prospecting stays opportunistic between calls.
Hypothesis: Blocked time doubles first-touches sent per rep per day.
Note timing
Variant: Write the CRM note within 2 minutes of hanging up (or auto-draft + 30-second review) vs batch-write notes at end of day.
Hypothesis: 2-minute notes are 3× more accurate on stage-and-next-step and reduce end-of-day fatigue.
Qualification framework
Variant: Run MEDDIC against the same 10 deals vs BANT.
Hypothesis: MEDDIC surfaces more Metric gaps and more Economic-Buyer-missing gaps, which tightens the forecast.
Pipeline inspection cadence
Variant: Weekly 30-minute pipeline review vs bi-weekly 60-minute.
Hypothesis: Weekly catches stalling deals 10 days earlier than bi-weekly.
Close-date discipline
Variant: Rep-set close date only vs rep-set close date adjusted by manager if it has not moved in 21 days.
Hypothesis: Manager adjustment on stalled close dates produces a more honest forecast without hurting rep morale.
Process experiments feel like management overhead to reps. Frame them the other way: the point is to free the rep from work that does not move the number. Every one of these removes admin drag or surfaces the half-dead deal earlier — both of which buy the rep back an afternoon a week.
20
Experiments in this post
5 outreach · 5 call · 5 cadence · 5 process.
3
Experiments per quarter
Run concurrently. Queue the other 17.
10days
To read most outreach tests
200 prospects per variant. No extensions.
6
Parts of a shipping experiment
Hypothesis · variable · control · sample · kill rule · writeup.
How to pick the 3 experiments to run this quarter
Twenty experiments is too many to run at once. Three is the right number for a team of four to seven reps. Two for a team of three or fewer. Picking well is the one decision that makes the whole post useful — the other 19 go in a queue, not in the rearview.
The prioritization matrix. Score each experiment on two axes: (1) likely effect size at the quarter level, 1–5; (2) cost to run in rep hours, 1–5 where 5 is cheapest. Multiply. Run the top three.
| Experiment | Effect (1–5) | Cost-inverse (1–5) | Score |
|---|---|---|---|
| 03 · Opener test | 4 | 5 | 20 |
| 13 · Touch gap | 3 | 5 | 15 |
| 17 · Note timing | 2 | 4 | 8 |
| 06 · Opening format | 3 | 2 | 6 |
A team with a reply-rate problem using this matrix runs experiments 3 and 13 first, queues 17 for next quarter, and deprioritizes 06. The math is coarse on purpose — the point is to break ties, not to rank 20 tests to three decimals.
The three-pick rules:
- · No two experiments share the same variable. If opener AND send time change this quarter, neither result reads cleanly.
- · At least one experiment hits the part of the funnel underperforming the team average. Do not test what is already best.
- · At least one experiment is cheap enough for one rep to run alone — so the team is not blocked if an experiment breaks mid-sprint.
The mistake most teams make is picking the three experiments they want to run. Pick the three that answer the question the team cannot answer today. "Is our opener broken?" is a better experiment prompt than "let us try that new opener I read about."
The 7-day sales experiment playbook
An experiment does not need a Gantt chart. It needs seven days, written out once, and a forum to read the result. Most outreach tests actually run 10 business days (two work weeks) — the playbook below maps to that cadence.
- Day 0
Friday before. Write the hypothesis on one line, the variant copy on another, the sample size on a third, the kill rule on a fourth. Slack it to the team. If anyone cannot describe the experiment in one sentence after reading, rewrite.
- Day 1
Monday launch. Split the prospect list 50/50. Launch both variants in parallel. For outreach, the sequence goes live in both versions; for calls, reps draw from a shared pool and record which variant they ran.
- Days 2–4
Watch the rates. Track reply, open, and book rates daily. If the variant trips the kill rule, kill it. If the control collapses (a sign that the data is off, not that the control is bad), pause and debug — do not rescue with more sample.
- Day 5
Halfway checkpoint. Are both variants on track to hit sample size by Day 10? If not, add prospects or extend the window by 3 days. Do not shrink the sample to fit the calendar — small samples are why teams reach wrong conclusions.
- Day 10
Freeze and read. Sample size hits. Freeze both variants. Pull the numbers from the CRM or outreach tool. No peeking early — the temptation to read it at Day 7 is what produces false positives.
- Day 11
Write the paragraph. Four sentences max: what was tested, what was seen, what the team is doing next, what to test next. Share in the pipeline review on Monday — 90 seconds on the screen, not a deck.
The forum matters more than the timeline. An experiment read out loud in a pipeline review, with numbers on the screen, is how teams build muscle around decisions. An experiment buried in a Notion doc never gets believed. Give it 90 seconds in the weekly ritual — that is all. If the result is inconclusive (variants too close to call), write that too. Inconclusive is a valid finding; it means the variable mattered less than expected, which is a learning in itself.
The 6 failure modes that kill sales experiments
Every experiment that fizzles fails in one of six ways. Spot the pattern before launch and the quarter is saved. Miss it and the team burns 10 business days on a test that produces no learning worth acting on.
- 1
No kill rule
The variant underperforms by week one, but nobody calls it. The rep keeps running it to avoid looking like a quitter. Three weeks later the pipeline is thin and nobody learned anything. Fix: write the kill rule on Day 0. Post it in the channel. Let the number fire the decision, not the human.
- 2
Two variables changed at once
The opener AND the send time got swapped. Replies move, and nobody can say which change caused it. Fix: one variable. If the team wants to test two, run two experiments in sequence, not one double-barreled one.
- 3
Sample too small
50 prospects per variant feels like enough until the variance is 3 percentage points and the "winner" was noise. Fix: 200 prospects for reply-rate tests, 20 calls for call-stage tests, 10 deals for win-rate tests. Below those floors, call it a directional signal, not a result.
- 4
No control
The team tested the new opener against "the old opener from Q1" — except nobody ran Q1's opener this week, so the comparison is against a ghost. Fix: the control must run in parallel, same segment, same week.
- 5
Result never written up
The experiment ran, numbers came in, and the quarter closed before anyone wrote the paragraph. The learning evaporated. Fix: a one-paragraph writeup is mandatory. It goes in the shared queue document. The next experiment cannot start until the last one is written up.
- 6
Wrong funnel stage
A team at 1% reply rate runs a close-phrasing test. Reply rate is the bottleneck; close phrasing is the wrong place to spend the hours. Fix: always experiment against the constraint. Identify the weakest stage in the team funnel first.
The failure mode underneath all six is treating experiments as inspirational rather than structural. An experiment is a rep-hour investment with an expected return; when the team treats it as "let us try this cool thing," the discipline falls apart. Treat each test as a small, serious bet. The compounding follows.
How Gangly gives reps a clean testbed for sales experiments
Experiments need two things reps rarely have inside the stack: parallel variants running in the same tool, and result data that is not buried in CRM exports. Gangly runs the workflow so the first experiment is the cheapest one to launch — every one after runs on the same rails.
- Workflow Sequencer — runs A/B variants of a cadence on split prospect lists inside one sequence, tagged as variant A or B. The rep sees the live reply rate per variant; the manager reads the result in the dashboard instead of a CSV export.
- Outreach Writer — drafts two versions of an opener from the same signal, flagged by variant. Reps review and approve both. Gangly does the typing; the rep owns the experiment.
- Post-Call Notes — tags every call with the experiment variant, so a call-format test produces a clean 20-call sample without the rep remembering which call was which.
The point is not that Gangly replaces the experiment — the rep and the manager still write the hypothesis, pick the variants, and read the result. Gangly removes the admin that otherwise makes experiments feel heavier than they should. If a team is running fewer than three experiments a quarter, the likely reason is not a lack of ideas — it is the overhead of setting one up inside a tool that was not built for parallel variants.
Start with one cadence test and read the result in 10 business days. The compounding is what matters: a team that ships 12 experiments a year turns 12 opinions into data. A team that ships none is still arguing about opener structure in Q4 with the same conviction it had in Q1 — and the same lack of evidence.
Related reading: our sales battle cards post covers the structure winning experiments get turned into, and the cold email reply rate study is the data set most of the outreach experiments above are calibrated against. The sales admin time study quantifies the rep-hour cost of the process experiments in section six.
Run the first experiment
Ship 3 sales experiments this quarter. Read the first result in 10 days.
14-day free trial. Connect HubSpot or Salesforce in 3 minutes. No credit card.
Frequently asked questions
What are sales experiments? +
Sales experiments are structured tests where a team changes one variable in their outreach, call, or process workflow and measures the effect over a fixed sample. Unlike a general "let us try something new," a sales experiment has a hypothesis, a single variable, a control, a sample size, a kill rule, and a written result. They are how sales teams learn what works on their specific buyers faster than the competition. Teams that run three per quarter ship 12 learnings a year.
How many sales experiments should a team run per quarter? +
Three experiments per quarter is the sweet spot for a team of four to seven reps. Two for a team of three or fewer. More than three overloads the prospect list, makes results hard to attribute, and burns rep hours that should go into pipeline. Always queue the next three in a shared document so each quarter starts with clarity, not a brainstorming session. The queue compounds — every quarter starts smarter than the last.
What is the minimum sample size for a sales experiment? +
Floor sample sizes depend on the metric: 200 prospects per variant for reply-rate tests, 20 calls per variant for call-stage tests, 10 deals per variant for win-rate tests. Below those floors you are reading variance, not signal. If a team cannot hit the floor in 10 business days, the right move is to extend the window — not to shrink the sample and draw a conclusion anyway. Small samples are the top reason sales experiments produce wrong answers.
How long should a sales experiment run? +
Reply-rate tests run for 10 business days. Call-stage tests take one full week of calling per rep (about 20 calls). Win-rate tests run for a full quarter because deals need time to close. Set the end date on Day 0 and stick to it — extending because the data "looks close" introduces bias. If the result is inconclusive at the end, log it as inconclusive and move on. Inconclusive is a valid finding.
What makes a sales experiment fail? +
Six failure modes kill most sales experiments: no kill rule, two variables changed at once, sample too small, no control, result never written up, and running experiments in the wrong funnel stage. All six trace back to treating experiments as inspiration instead of as a structural rep-hour investment. The fix is to write the hypothesis, variable, sample, and kill rule on Day 0 — before launch — and to read the result in the weekly pipeline review.
Can small sales teams run experiments, or is it only for big teams? +
Small sales teams benefit more from experiments than big ones. A two-rep team has less data, so each experiment produces a higher information gain per rep-hour. The modifications are practical: pick experiments a single rep can run (so the team is not blocked if one breaks), prefer outreach experiments over call experiments (faster to hit sample size), and lean on the shared queue document so learnings compound across quarters even as reps come and go.