What are sales experiments?

Sales experiments are structured tests where a team changes one variable in their outreach, call, or process workflow and measures the effect over a fixed sample. Unlike a general "let us try something new," a sales experiment has a hypothesis, a single variable, a control, a sample size, a kill rule, and a written result. They are how sales teams learn what works on their specific buyers faster than the competition. Teams that run three per quarter ship 12 learnings a year.

How many sales experiments should a team run per quarter?

Three experiments per quarter is the sweet spot for a team of four to seven reps. Two for a team of three or fewer. More than three overloads the prospect list, makes results hard to attribute, and burns rep hours that should go into pipeline. Always queue the next three in a shared document so each quarter starts with clarity, not a brainstorming session. The queue compounds — every quarter starts smarter than the last.

What is the minimum sample size for a sales experiment?

Floor sample sizes depend on the metric: 200 prospects per variant for reply-rate tests, 20 calls per variant for call-stage tests, 10 deals per variant for win-rate tests. Below those floors you are reading variance, not signal. If a team cannot hit the floor in 10 business days, the right move is to extend the window — not to shrink the sample and draw a conclusion anyway. Small samples are the top reason sales experiments produce wrong answers.

How long should a sales experiment run?

Reply-rate tests run for 10 business days. Call-stage tests take one full week of calling per rep (about 20 calls). Win-rate tests run for a full quarter because deals need time to close. Set the end date on Day 0 and stick to it — extending because the data "looks close" introduces bias. If the result is inconclusive at the end, log it as inconclusive and move on. Inconclusive is a valid finding.

What makes a sales experiment fail?

Six failure modes kill most sales experiments: no kill rule, two variables changed at once, sample too small, no control, result never written up, and running experiments in the wrong funnel stage. All six trace back to treating experiments as inspiration instead of as a structural rep-hour investment. The fix is to write the hypothesis, variable, sample, and kill rule on Day 0 — before launch — and to read the result in the weekly pipeline review.

Can small sales teams run experiments, or is it only for big teams?

Small sales teams benefit more from experiments than big ones. A two-rep team has less data, so each experiment produces a higher information gain per rep-hour. The modifications are practical: pick experiments a single rep can run (so the team is not blocked if one breaks), prefer outreach experiments over call experiments (faster to hit sample size), and lean on the shared queue document so learnings compound across quarters even as reps come and go.

Sales Experiments: 20 Tests You Can Run This Quarter

Why most sales teams stop experimenting after week 3

A VP reads a Gong report. Decides the team is going to test a new cold email opener. Monday morning: kickoff. Week one: three reps try it. Week two: two reps try it. Week three: one rep — the one who was never going to hit quota anyway. The "experiment" is dead, nobody called it, and nobody knows whether the new opener would have worked.

The test never had a chance. It had no hypothesis, no kill rule, no sample size, and no forum to read the result. It was a directive disguised as an experiment. Most sales experiments die this way — the team agrees to "try something different," leaves unclear what counts as different and what counts as success, and by week three the urgency of the quarter swallows the discipline of the test.

The sales experiments that actually change a quarter have a shape. A single variable. A sample big enough to read. A kill rule that fires when the variable underperforms. A forum — usually the weekly pipeline call — where the result gets read out loud in 90 seconds.

This post is that shape, applied to 20 experiments. Pick three. Run them for 10 business days. Read the result in the pipeline review. Kill the losers, scale the winner, queue the next three. The point is not testing for its own sake — it is how a team learns its own market faster than the competition learns it.

Why experiments compound. Every rep you hire adds one set of hands. Every experiment you run adds one unit of knowledge about what your buyers actually do. Teams that compound the second one build a lasting edge. Five experiments a quarter ships 20 a year — the difference between a team that "does outbound" and a team that knows which outbound move works on which segment.

The anatomy of a sales experiment that ships

A sales experiment that ships has six parts. Skip any one and the thing turns into a directive nobody tracks. The rest of the post — and every one of the 20 experiments — is written against this shape.

1

The hypothesis

One sentence, written in "If X, then Y" form. "If we lead with the signal in the opener, reply rate climbs from 6% to 10%." Not "we should try signal-led openers" — that is a suggestion, not an experiment.
2

The single variable

One thing changes. Not the opener AND the CTA AND the send time. If three things change and replies move, you do not know which one moved them. Discipline here is what separates learning from noise.
3

The control

The old version runs beside the new version on the same kind of prospect. If the control runs on Tier-2 accounts and the variant runs on Tier-1, the result is a firmographic difference, not a copy difference. Split the same segment and send both the same week.
4

The sample size

Small enough to run in 10 business days; big enough to read. A sensible floor: 200 prospects per variant for reply-rate tests, 20 calls per variant for call-stage tests, 10 deals per variant for win-rate tests. Below that, you are reading variance.
5

The kill rule

The condition that makes you stop the experiment before the sample fills. "If the variant replies under 2% after the first 75 prospects, kill it." A kill rule protects the pipeline from a bad variant that runs unchecked for three weeks.
6

The learning

Written in one paragraph, shared in the pipeline review. Not a slide deck — a paragraph. "We tested signal-led openers against pattern-led on 240 prospects split evenly. Signal-led replied at 9.2%, pattern-led at 5.8%. Rolling signal-led into the Growth segment; retiring pattern-led."

If the shape sounds obvious, good — running it every time is the thing teams skip. The gap between "we tested this" and "we know this" is the shape. Reps write down what they learned, managers compound the learnings into the playbook, the next new rep inherits a sharper starting point than the last.

The experiments in the rest of this post are all written to fit this shape. Each has a hypothesis, a variable, a sample floor, and a kill rule. Pick the ones that match the quarter; ignore the rest for now and drop them in the queue document for next quarter.

5 outreach experiments to run this quarter

Outreach experiments move the top of the funnel fast. Reply-rate tests read in 10 business days with 200 prospects per variant — the smallest, cheapest way to generate learning. Every experiment below has the same shape: one variable, same segment split evenly, 10-day window, written kill rule. Run any two of the five this quarter.

Subject line format

Variant: Three variants: question ("Why is Acme still on Apollo?"), stat ("35% of your pipeline is Tier-2"), one-word lowercase ("timing").

Hypothesis: One-word beats stat beats question on open rate for our ICP.

Sample: 300 prospects split 3 ways, 10 business days Kill rule: Any variant under 28% open rate after 100 prospects

Send time

Variant: Same email body, three send times local to the prospect: 6:15am, 10:30am, 4:45pm.

Hypothesis: 6:15am beats 10:30am and 4:45pm because the email is the first one the buyer sees that day.

Sample: 210 prospects split 3 ways Kill rule: Any time slot replies under 3% at 70 prospects

Opener format

Variant: Signal-led ("Saw Acme posted a VP Sales role Thursday — curious if outbound is on the list for Q2") vs pattern-interrupt ("This is a cold email — skip if that is not your thing").

Hypothesis: Signal-led outperforms pattern-interrupt by 1.5× on reply rate.

Sample: 200 prospects per variant Kill rule: Variant under 4% reply rate at 75 prospects

Body length

Variant: 3-sentence body (one pain, one proof, one ask) vs 5-sentence body (signal, pain, proof, customer example, ask). Measure reply rate AND meeting-show rate.

Hypothesis: 3-sentence wins on reply rate; 5-sentence wins on meeting quality (fewer no-shows).

Sample: 200 prospects per variant Kill rule: Either variant under 5% reply rate at 80

Channel sequence

Variant: Email-first → LinkedIn connect day 3 → LinkedIn DM day 5 vs LinkedIn connect day 1 → LinkedIn DM day 2 → email day 4.

Hypothesis: LinkedIn-first wins on reply rate for Series-A-and-up prospects because the warmth of a connection carries the email.

Sample: 150 prospects per variant Kill rule: Connection acceptance under 25% at 50 prospects

Across all five, the shape is the same — one variable, 10 days, a kill rule. The winner gets rolled into the default sequence; the losers get retired. The learning goes in the sales playbook so the next rep inherits the answer, not the question.

5 call performance experiments worth running

Call experiments need fewer samples but more patience — 20 calls per variant takes a full week for most AEs. The payoff is directly visible in stage progression. These are the tests that move deals, not just replies. Assign one to each senior rep; have juniors shadow to speed up the sample.

Opening format

Variant: Agenda-first ("Three things I would love to cover — your environment, your priorities, what a next step would look like") vs story-first ("Last month we worked with a company that looked a lot like yours...").

Hypothesis: Agenda-first wins on demo→evaluation conversion because the buyer trusts the rep is organized.

Sample: 20 discovery calls per variant Kill rule: Next-step-booked-on-call rate under 40% at 10 calls

Discovery depth

Variant: One-pain deep-dive ("What is the single most urgent sales problem you have right now?" + 15 min of follow-up) vs 5-question frame ("Let me ask five things...").

Hypothesis: One-pain wins on deal quality; 5-question wins on breadth.

Sample: 20 calls per variant Kill rule: Average deal size drops 30% on either variant at 10 calls

Demo sequencing

Variant: Pain-led demo (three features tied to the pain the buyer named) vs tour demo (five most-used features in order).

Hypothesis: Pain-led doubles the probability of a next-step close on the call.

Sample: 15 demo calls per variant Kill rule: Next-step booked rate under 50% at 8 calls

Close phrasing

Variant: "Does this make sense as a next step?" vs "What would change for your team if you had this in 30 days?"

Hypothesis: The open-ended close surfaces the champion internal narrative and produces more specific next steps.

Sample: 20 late-stage calls per variant Kill rule: MAP-detail manager rating under 3/5 at 10 calls

ROI doc timing

Variant: Send the ROI one-pager 24 hours before the pricing call vs within 30 minutes after.

Hypothesis: Sending before makes the pricing call shorter and surfaces the objection earlier.

Sample: 15 pricing calls per variant Kill rule: Close-date-commit rate under 50% at 8 calls

Call experiments feel slower than outreach ones, but they move the deal, not just the reply. Two reps each running one call experiment for 10 business days generates 40 data points — enough to bet the playbook on. Write the result paragraph the same Friday the sample closes.

5 pipeline and cadence experiments that move win rate

Cadence experiments read over two to three weeks because the variable is the shape of the sequence, not a single touch. Win-rate tests take a full quarter. Run both. The payoff on cadence experiments compounds longer than the sprint horizon — a 2-point reply-rate lift on a 300-prospect-a-month cadence produces 72 extra conversations across a year.

Cadence length

Variant: 8 touches over 14 days vs 12 touches over 14 days.

Hypothesis: 12-touch wins on reply rate but loses on opt-out rate — read both.

Sample: 200 prospects per variant Kill rule: Opt-out rate above 4% on either at 100

Cadence duration

Variant: 8 touches over 14 days vs 8 touches over 21 days.

Hypothesis: 21-day wins on reply rate because the buyer has time to return to a dormant thread.

Sample: 150 prospects per variant Kill rule: Reply rate under 6% at 75 on either

Touch gap

Variant: Day-1 touch → day-2 follow-up vs day-1 touch → day-4 follow-up.

Hypothesis: Day-2 follow-up wins on reply rate because the buyer remembers the first message.

Sample: 200 prospects per variant Kill rule: Day-2 variant under 5% reply at 100

Channel mix

Variant: Email-heavy (6 email, 2 LinkedIn) vs hybrid (4/4) vs LinkedIn-heavy (2/6).

Hypothesis: Hybrid wins on reply rate for senior buyers (VP+).

Sample: 100 prospects per variant, senior-only Kill rule: LinkedIn-heavy connect acceptance under 20% at 40

Breakup email ask

Variant: Specific-ask ("if timing changes, reply ‘pilot’ and we will re-engage") vs vague breakup email with no ask.

Hypothesis: A one-word-reply ask triples the breakup-reply rate.

Sample: 150 prospects per variant Kill rule: Ask variant replies under 3% at 75

Cadence tests compound. A 2-percentage-point lift on a 300-prospect-a-month cadence produces 6 extra conversations a month. Over four quarters that is 72 — and 72 conversations at a 20% meeting rate is 14 net-new meetings from a single, well-designed test. Most teams never run it because "cadence" sounds too structural to A/B test.

5 CRM and process experiments most teams skip

Process experiments do not move reply rate. They move rep hours. Rep hours move the number of calls a rep can run; that moves the number of deals; that moves the quarter. Skip these and you are optimizing the car while leaving it in second gear. The four below read across a full quarter — start them alongside the faster outreach tests.

Time-blocked prospecting

Variant: Every rep blocks 9–11am Tue/Thu for prospecting (no meetings, no CRM) vs prospecting stays opportunistic between calls.

Hypothesis: Blocked time doubles first-touches sent per rep per day.

Sample: Full team for 10 business days Kill rule: No activity lift at day 5

Note timing

Variant: Write the CRM note within 2 minutes of hanging up (or auto-draft + 30-second review) vs batch-write notes at end of day.

Hypothesis: 2-minute notes are 3× more accurate on stage-and-next-step and reduce end-of-day fatigue.

Sample: 10 reps, 10 days Kill rule: Note-completion rate below 80% on either

Qualification framework

Variant: Run MEDDIC against the same 10 deals vs BANT.

Hypothesis: MEDDIC surfaces more Metric gaps and more Economic-Buyer-missing gaps, which tightens the forecast.

Sample: 10 late-stage deals per framework Kill rule: Manager rejects forecast accuracy assessment

Pipeline inspection cadence

Variant: Weekly 30-minute pipeline review vs bi-weekly 60-minute.

Hypothesis: Weekly catches stalling deals 10 days earlier than bi-weekly.

Sample: Two teams at same stage for one quarter Kill rule: No stage-age improvement at week 6

Close-date discipline

Variant: Rep-set close date only vs rep-set close date adjusted by manager if it has not moved in 21 days.

Hypothesis: Manager adjustment on stalled close dates produces a more honest forecast without hurting rep morale.

Sample: Full team for one quarter Kill rule: Rep survey dissatisfaction spikes above baseline

Process experiments feel like management overhead to reps. Frame them the other way: the point is to free the rep from work that does not move the number. Every one of these removes admin drag or surfaces the half-dead deal earlier — both of which buy the rep back an afternoon a week.

Experiments in this post

5 outreach · 5 call · 5 cadence · 5 process.

Experiments per quarter

Run concurrently. Queue the other 17.

10days

To read most outreach tests

200 prospects per variant. No extensions.

Parts of a shipping experiment

Hypothesis · variable · control · sample · kill rule · writeup.

How to pick the 3 experiments to run this quarter

Twenty experiments is too many to run at once. Three is the right number for a team of four to seven reps. Two for a team of three or fewer. Picking well is the one decision that makes the whole post useful — the other 19 go in a queue, not in the rearview.

The prioritization matrix. Score each experiment on two axes: (1) likely effect size at the quarter level, 1–5; (2) cost to run in rep hours, 1–5 where 5 is cheapest. Multiply. Run the top three.

Experiment	Effect (1–5)	Cost-inverse (1–5)	Score
03 · Opener test	4	5	20
13 · Touch gap	3	5	15
17 · Note timing	2	4	8
06 · Opening format	3	2	6

A team with a reply-rate problem using this matrix runs experiments 3 and 13 first, queues 17 for next quarter, and deprioritizes 06. The math is coarse on purpose — the point is to break ties, not to rank 20 tests to three decimals.

The three-pick rules:

· No two experiments share the same variable. If opener AND send time change this quarter, neither result reads cleanly.
· At least one experiment hits the part of the funnel underperforming the team average. Do not test what is already best.
· At least one experiment is cheap enough for one rep to run alone — so the team is not blocked if an experiment breaks mid-sprint.

The mistake most teams make is picking the three experiments they want to run. Pick the three that answer the question the team cannot answer today. "Is our opener broken?" is a better experiment prompt than "let us try that new opener I read about."

The 7-day sales experiment playbook

An experiment does not need a Gantt chart. It needs seven days, written out once, and a forum to read the result. Most outreach tests actually run 10 business days (two work weeks) — the playbook below maps to that cadence.

Day 0
Friday before. Write the hypothesis on one line, the variant copy on another, the sample size on a third, the kill rule on a fourth. Slack it to the team. If anyone cannot describe the experiment in one sentence after reading, rewrite.
Day 1
Monday launch. Split the prospect list 50/50. Launch both variants in parallel. For outreach, the sequence goes live in both versions; for calls, reps draw from a shared pool and record which variant they ran.
Days 2–4
Watch the rates. Track reply, open, and book rates daily. If the variant trips the kill rule, kill it. If the control collapses (a sign that the data is off, not that the control is bad), pause and debug — do not rescue with more sample.
Day 5
Halfway checkpoint. Are both variants on track to hit sample size by Day 10? If not, add prospects or extend the window by 3 days. Do not shrink the sample to fit the calendar — small samples are why teams reach wrong conclusions.
Day 10
Freeze and read. Sample size hits. Freeze both variants. Pull the numbers from the CRM or outreach tool. No peeking early — the temptation to read it at Day 7 is what produces false positives.
Day 11
Write the paragraph. Four sentences max: what was tested, what was seen, what the team is doing next, what to test next. Share in the pipeline review on Monday — 90 seconds on the screen, not a deck.

The forum matters more than the timeline. An experiment read out loud in a pipeline review, with numbers on the screen, is how teams build muscle around decisions. An experiment buried in a Notion doc never gets believed. Give it 90 seconds in the weekly ritual — that is all. If the result is inconclusive (variants too close to call), write that too. Inconclusive is a valid finding; it means the variable mattered less than expected, which is a learning in itself.

The 6 failure modes that kill sales experiments

Every experiment that fizzles fails in one of six ways. Spot the pattern before launch and the quarter is saved. Miss it and the team burns 10 business days on a test that produces no learning worth acting on.

1

No kill rule

The variant underperforms by week one, but nobody calls it. The rep keeps running it to avoid looking like a quitter. Three weeks later the pipeline is thin and nobody learned anything. Fix: write the kill rule on Day 0. Post it in the channel. Let the number fire the decision, not the human.
2

Two variables changed at once

The opener AND the send time got swapped. Replies move, and nobody can say which change caused it. Fix: one variable. If the team wants to test two, run two experiments in sequence, not one double-barreled one.
3

Sample too small

50 prospects per variant feels like enough until the variance is 3 percentage points and the "winner" was noise. Fix: 200 prospects for reply-rate tests, 20 calls for call-stage tests, 10 deals for win-rate tests. Below those floors, call it a directional signal, not a result.
4

No control

The team tested the new opener against "the old opener from Q1" — except nobody ran Q1's opener this week, so the comparison is against a ghost. Fix: the control must run in parallel, same segment, same week.
5

Result never written up

The experiment ran, numbers came in, and the quarter closed before anyone wrote the paragraph. The learning evaporated. Fix: a one-paragraph writeup is mandatory. It goes in the shared queue document. The next experiment cannot start until the last one is written up.
6

Wrong funnel stage

A team at 1% reply rate runs a close-phrasing test. Reply rate is the bottleneck; close phrasing is the wrong place to spend the hours. Fix: always experiment against the constraint. Identify the weakest stage in the team funnel first.

The failure mode underneath all six is treating experiments as inspirational rather than structural. An experiment is a rep-hour investment with an expected return; when the team treats it as "let us try this cool thing," the discipline falls apart. Treat each test as a small, serious bet. The compounding follows.

How Gangly gives reps a clean testbed for sales experiments

Experiments need two things reps rarely have inside the stack: parallel variants running in the same tool, and result data that is not buried in CRM exports. Gangly runs the workflow so the first experiment is the cheapest one to launch — every one after runs on the same rails.

Workflow Sequencer — runs A/B variants of a cadence on split prospect lists inside one sequence, tagged as variant A or B. The rep sees the live reply rate per variant; the manager reads the result in the dashboard instead of a CSV export.
Outreach Writer — drafts two versions of an opener from the same signal, flagged by variant. Reps review and approve both. Gangly does the typing; the rep owns the experiment.
Post-Call Notes — tags every call with the experiment variant, so a call-format test produces a clean 20-call sample without the rep remembering which call was which.

The point is not that Gangly replaces the experiment — the rep and the manager still write the hypothesis, pick the variants, and read the result. Gangly removes the admin that otherwise makes experiments feel heavier than they should. If a team is running fewer than three experiments a quarter, the likely reason is not a lack of ideas — it is the overhead of setting one up inside a tool that was not built for parallel variants.

Start with one cadence test and read the result in 10 business days. The compounding is what matters: a team that ships 12 experiments a year turns 12 opinions into data. A team that ships none is still arguing about opener structure in Q4 with the same conviction it had in Q1 — and the same lack of evidence.

Related reading: our sales battle cards post covers the structure winning experiments get turned into, and the cold email reply rate study is the data set most of the outreach experiments above are calibrated against. The sales admin time study quantifies the rep-hour cost of the process experiments in section six.