Probe Design Workflow¶
This page describes the exact probe design pipeline used by designProbes and
designProbesBatch, including the order of operations, default parameters, and
output formats. The steps below are listed in the same order the code applies
them.
Pipeline overview (ordered)¶
- Read FASTA record(s).
designProbesuses only the first FASTA record.designProbesBatchprocesses every record.- A FASTA header can override the channel using
channel=.... - Genome masking is enabled by default and requires a registered species
(use
fetchMouseIndexorbuildGenomeIndex) or an explicit--index.
- Optional repeat masking (disabled by default).
- If enabled, the input sequence is masked using RepeatMasker and masked bases
are converted to
N. Tiles containing masked bases are discarded later.
- If enabled, the input sequence is masked using RepeatMasker and masked bases
are converted to
- Tile the target sequence (reverse-complemented).
- The sequence is scanned with a step size of 1 to generate 52-nt tiles by default.
- Each tile is reverse-complemented so probes are antisense to the target.
- Any tile containing
Nis discarded immediately.
- Filter homopolymer runs (C and G).
- Tiles are removed if they contain long runs of C's or G's beyond the allowed thresholds.
- Filter hairpins.
- Each tile is screened with
primer3for self-hairpin structures. - Tiles with a predicted hairpin melting temperature >= 45 C are removed.
- Each tile is screened with
- Genome uniqueness filtering (Bowtie2).
- Remaining tiles are aligned to the reference genome index.
- Tiles with more than the allowed number of hits are removed.
- GC-content filtering.
- Tiles must fall within the allowed GC% range.
- Gibbs free energy filtering.
- Tiles with binding free energies outside the allowed range are removed.
- Split tiles into probe halves.
- Each tile is split into two 25-mers (for a 52-mer tile), dropping the two middle bases to create a short gap.
- Optional dTm filtering.
- If enabled, tiles are removed when the temperature difference between the two halves is too large.
- Select top N non-overlapping tiles.
- Tiles are ranked by how close their Gibbs free energy is to the target.
- Overlapping tiles are skipped to keep probes spread out.
- Add HCR initiators and spacers.
- Channel-specific initiator sequences are appended to the probe halves.
Default parameters¶
These are the defaults used if you do not override them on the command line.
--tileSize: 52--channel: B1--species: mouse--minGC: 45.0--maxGC: 55.0--targetGC: 50.0 (not currently used in ranking or filtering)--minGibbs: -70.0--maxGibbs: -50.0--targetGibbs: -60.0--dTmMax: 5.0--dTmFilter: off--maxRunLength: 7--maxRunMismatches: 2--maxProbes: 20--num-hits-allowed: 1--no-genomemask: off (genome masking is on by default)--no-repeatmask: on (repeat masking is disabled by default)
Other defaults: - Hairpin filter threshold: 45 C (not currently configurable). - Tile step size: 1 nt (not currently configurable).
Thermodynamic parameters (what they mean and why they are used)¶
- GC%: Fraction of G/C bases in the tile. GC-rich probes bind more strongly because G-C base pairs have three hydrogen bonds. Filtering keeps probes within a moderate binding-strength window and improves uniformity.
- Tm (melting temperature): Predicted temperature at which half of probe-
target duplexes would melt. Reported for each full tile using
primer3. This provides a quick proxy for binding strength across candidates. - dTm: The absolute difference in Tm between the two split probe halves. Large dTm values imply one half binds much more strongly than the other, which can reduce uniformity. Filtering is optional and off by default.
- Gibbs free energy (Gibbs FE): Predicted free energy of RNA/DNA binding,
computed at 37 C with a salt correction (0.33 M). More negative values indicate
stronger binding. Filtering removes probes that are too weak or too strong, and
ranking prefers probes closest to
--targetGibbs. - Hairpin Tm: Predicted melting temperature of a self-hairpin in the probe sequence. Strong hairpins can compete with target binding, so tiles with a hairpin Tm >= 45 C are removed.
Output formats¶
Primary TSV output¶
designProbes and designProbesBatch write a tab-delimited table with these
columns:
name: Tile identifier formatted asrecord:start-end.probe: The reverse-complemented tile sequence.start: 1-based start position in the original target sequence.length: Tile length (default 52).P1: Probe half with the odd initiator appended.P2: Probe half with the even initiator appended.channel: HCR channel used (e.g., B1).GC: GC% of the tile.Tm:primer3melting temperature of the full tile.dTm: Absolute Tm difference between the two halves.GibbsFE: Calculated Gibbs free energy of binding (kcal/mol).
Example (single row):
MyTarget:1-52\tacg...\t1\t52\tP1SEQ...\tP2SEQ...\tB1\t50.00\t70.12\t1.83\t-60.45
IDT ordering output (optional)¶
If --idt is provided, an additional file is written with two columns:
Name and Sequence. Each tile produces two entries:
- {name}:{channel}:odd for P1
- {name}:{channel}:even for P2
Genome masking artifacts¶
When genome masking is enabled, Bowtie2 writes a SAM file in the working
directory named {targetName}.sam. This file lists all alignments used to
compute hit counts.