API Reference¶
HCRProbeDesign.probeDesign¶
Core probe design workflow and CLI entry points.
build_parser()
¶
Build the argument parser for probe design CLIs.
| Returns: |
|
|---|
calcOligoCost(tiles, pricePerBase=0.19)
¶
Calculate the cost of the oligo library
| Parameters: |
|
|---|
| Returns: |
|
|---|
main()
¶
Main function for HCR Probe design. Called when used directly from cmdline
main_batch()
¶
Batch probe design for multi-record FASTA inputs.
outputIDT(tiles, outHandle=sys.stdout)
¶
Formats tile output for direct ordering using IDT template
outputRunParams(args)
¶
Print run parameters to stderr.
| Parameters: |
|
|---|
| Returns: |
|
|---|
outputTable(tiles, outHandle=sys.stdout)
¶
Formats tile output and writes to outHandle
scanSequence(sequence, seqName, tileStep=1, tileSize=52)
¶
Given a sequence, a name for the sequence, a step size, and a tile size, scanSequence will return a list of Tile objects that tile across the sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
test()
¶
Manual test harness with hard-coded parameters and FASTA file.
| Returns: |
|
|---|
HCRProbeDesign.tiles¶
Tile and probe representation used in the design pipeline.
Tile(sequence, seqName, startPos)
¶
Represents a candidate probe tile extracted from a target sequence.
Initialize a Tile from a sequence and positional metadata.
| Parameters: |
|
|---|
GC()
¶
Return GC percentage for the tile sequence.
RajTm()
¶
Return the SantaLucia-style melting temperature estimate.
Tm()
¶
Return the basic melting temperature estimate for the tile.
__cmp__(other)
¶
Legacy comparison for sorting tiles by name and position.
__eq__(other)
¶
Compare tiles by their sequences.
__hash__()
¶
Hash tiles by their sequence.
__iter__()
¶
Iterate over the tile sequence bases.
__len__()
¶
Return the length of the tile sequence.
__repr__()
¶
Return a debug-friendly representation of the tile.
__str__()
¶
Return a human-readable representation of the tile.
calcGibbs()
¶
Calculate the Gibbs free energy of binding for a given sequence
calcdTm()
¶
Calculate the difference in melting temperature between the 5' and 3' sequences
distance(b, enforceStrand=False)
¶
Returns absolute distance between self and another interval start positions.
hasRuns(runChar, runLength, mismatches)
¶
Given a sequence, a run character, a run length, and a number of mismatches, returns True if the sequence has a run of the specified character of the specified length, with the specified number of mismatches
| Parameters: |
|
|---|
| Returns: |
|
|---|
isMasked()
¶
Return True if the tile contains masked bases.
makeProbes(channel)
¶
This function creates the probes for the channel.
| Parameters: |
|
|---|
overlaps(b)
¶
Return true if b overlaps self
splitProbe()
¶
Split sequence in half with two bases in the middle removed (flexible gap to help initiator sequence land) ie. a 52mer will be split into two 25mers with the middle two bases of the 52mer dropped
toBed()
¶
Placeholder for BED formatting support.
toFasta()
¶
Return the tile formatted as a FASTA record.
validate()
¶
Run lightweight validation checks on the tile.
TileError(value)
¶
Bases: Exception
Custom exception type for tile validation and processing.
HCRProbeDesign.genomeMask¶
Genome masking utilities based on Bowtie2 alignments.
countHitsFromSam(samFile)
¶
For each read in the sam file, add 1 to the count of hits for that read
| Parameters: |
|
|---|
| Returns: |
|
|---|
genomemask(fasta_string, handleName='tmp', species='mouse', nAlignments=3, index=None)
¶
Run Bowtie2 to align probe tiles and write a SAM file to disk.
| Parameters: |
|
|---|
| Returns: |
|
|---|
install_index(url='https://genome-idx.s3.amazonaws.com/bt/mm10.zip', genome='mm10', species='mouse')
¶
Download and extract a prebuilt Bowtie2 index into the package indices.
| Parameters: |
|
|---|
| Returns: |
|
|---|
test()
¶
Quick manual test for Bowtie2 masking and SAM parsing.
HCRProbeDesign.referenceGenome¶
Build and register Bowtie2 indices for reference genomes.
build_bowtie2_index(fasta_paths, species, index_name=None, indices_dir=None, threads=1, force=False)
¶
Build a Bowtie2 index from the provided FASTA files.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
collect_fasta_inputs(paths)
¶
Resolve FASTA inputs from files and directories.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
format_index_path(index_prefix)
¶
Format an index prefix relative to the package when possible.
| Parameters: |
|
|---|
| Returns: |
|
|---|
load_config(config_path=DEFAULT_CONFIG_PATH)
¶
Load the HCRconfig.yaml file.
| Parameters: |
|
|---|
| Returns: |
|
|---|
main()
¶
CLI entry point for building and registering a reference genome index.
register_species(config_path, species, index_prefix, force=False)
¶
Register a species and its Bowtie2 index prefix in the config file.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
save_config(config, config_path=DEFAULT_CONFIG_PATH)
¶
Write configuration data to HCRconfig.yaml.
| Parameters: |
|
|---|
| Returns: |
|
|---|
HCRProbeDesign.sequencelib¶
Sequence parsing and utility functions.
FastaIterator(handle)
¶
Generator function to iterate over fasta records in
GenRandomSeq(length, type='DNA')
¶
Generate a random sequence of DNA or RNA of a given length
| Parameters: |
|
|---|
| Returns: |
|
|---|
allindices(string, sub, listindex=[], offset=0)
¶
Return a list of all indices of a substring in a string
| Parameters: |
|
|---|
| Returns: |
|
|---|
complement(s)
¶
Return the complement of a DNA sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
draw(distribution)
¶
Draw a random value from the distribution, where values with a higher probability are drawn more often
| Parameters: |
|
|---|
| Returns: |
|
|---|
find_all(seq, sub)
¶
Find all occurences of a substring in a string
| Parameters: |
|
|---|
| Returns: |
|
|---|
gc_content(seq)
¶
Given a DNA sequence, return the percentage of G's and C's in the sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
genRandomFromDist(length, freqs)
¶
Generates a random sequence of length 'length' drawing from a distribution of base frequencies in a dictionary
getGC(seq)
¶
The function getGC(seq) takes a string of DNA sequence as input and returns the GC content of the sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
getTm(seq)
¶
The function getTm(seq) takes a sequence as an argument and returns the melting temperature of the sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
get_seeds(iter, seeds={})
¶
Given a list of sequences, return a dictionary of the counts of each seed
| Parameters: |
|
|---|
| Returns: |
|
|---|
kmer_dictionary(seq, k, dic={}, offset=0)
¶
Returns dictionary of k,v = kmer:'list of kmer start positions in seq'
kmer_dictionary_counts(seq, k, dic={})
¶
Returns a dictionary of k,v = kmer:'count of kmer in seq'
kmer_stats(kmer, dic, genfreqs)
¶
Takes as argument a kmer string, a dictionary with kmers as keys from kmer_dictionary_counts, and a dictionary of genomic frequencies with kmers as keys. Returns a dictionary of stats for kmer ("Signal2Noise Ratio, Z-score")
makeDistFromFreqs(freqs)
¶
Given a dictionary of character frequencies, return a list of cumulative frequencies
| Parameters: |
|
|---|
| Returns: |
|
|---|
mcount(s, chars)
¶
Sums the counts of appearances of each char in chars
| Parameters: |
|
|---|
| Returns: |
|
|---|
prob_seq(seq, pGC=0.5)
¶
Given a sequence and a background GC probability, what is the probability of getting that sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
rcomp(s)
¶
Does same thing as reverse_complement only cooler
reverse_complement(s)
¶
Return the reverse complement of a DNA sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
seed()
¶
Seed the random number generator with system entropy.
transcribe(seq)
¶
The function transcribe() takes a DNA sequence and replaces each instance of the nucleotide T with a uracil (U) in the transcribed RNA sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
HCRProbeDesign.thermo¶
@authors: Marshall J. Levesque Arjun Raj Daniel Wei
Tm(sequence)
¶
The function calculates the melting temperature of a sequence
| Parameters: |
|
|---|
| Returns: |
|
|---|
Tm_RNA_DNA(sequence)
¶
Given a sequence, the function returns the Tm of the sequence using the SantaLucia 98 parameters
| Parameters: |
|
|---|
| Returns: |
|
|---|
containsAny(astring, aset)
¶
Check whether a string contains any of the given characters.
| Parameters: |
|
|---|
| Returns: |
|
|---|
gibbs(dH, dS, temp=37)
¶
Calc Gibbs Free Energy in cal/mol from enthaply, entropy, and temperature
Arguments: dH -- enthalpy in kcal/mol dS -- entropy in cal/(mol * Kelvin) temp -- temperature in celcius (default 37 degrees C)
init_dna_dna(inseq)
¶
Return [enthalpy, entropy] list with units kcal/mol and cal/(mol*Kelvin) for DNA/DNA duplex initiation for the input DNA sequence (actg 5'->3'). Values from SantaLucia 1998. Argument is DNA
init_rna_dna()
¶
Return [enthalpy, entropy] list in kcal/mol and cal/(mol*Kelvin) for RNA/DNA duplex initiation. Values from Sugimoto et al 1995
melting_temp(dH, dS, ca, cb, salt)
¶
Calculates the melting temperature of a DNA sequence.
| Parameters: |
|
|---|
| Returns: |
|
|---|
overhang_dna(inseq, end)
¶
Return Gibbs free energy at 37degC (in kcal/mol) contribution from single base overhang in DNA/DNA duplex.
Arguments: inseq - 2bp DNA sequence (5' -> 3') end - specifies which end the over hang is on (valid values: 3 or 5)
Table 2 in Bommarito, S. (2000). Nucleic Acids Research
overhang_rna(inseq, end)
¶
Return Gibbs free energy at 37degC (in kcal/mol) contribution from single base overhang in RNA/RNA duplex.
Arguments: inseq - 2bp RNA sequence (5' -> 3') uracil->thymidine end - specifies which end the over hang is on (valid values: 3 or 5)
Table 3 in Freier et al, Biochemistry, 1986
salt_adjust(delG, nbases, saltconc)
¶
Adjust Gibbs Free Energy from 1M Na+ for another concentration
Arguments: delG -- Gibbs free energy in kcal/mol nbases -- number of bases in the sequence saltconc -- desired Na+ concentration for new Gibbs free energy calculation
Equation 7 SantaLucia 1998
stacks_dna_dna(inseq, temp=37)
¶
Calculate thermodynamic values for DNA/DNA hybridization.
Input Arguments: inseq -- the input DNA sequence of the DNA/DNA hybrid (5'->3') temp -- in celcius for Gibbs free energy calc (default 37degC) salt -- salt concentration in units of mol/L (default 0.33M)
Return [enthalpy, entropy] list in kcal/mol and cal/(mol*Kelvin)
stacks_rna_dna(inseq)
¶
Calculate RNA/DNA base stack thermodynamic values (Sugimoto et al 1995)
Sugimoto 95 parameters for RNA/DNA Hybridization (Table 3) "Thermodynamic Parameters To Predict Stability of RNA/DNA Hybrid Duplexes" in Biochemistry 1995
Input Arguments: inseq -- RNA sequence of the RNA/DNA hybrid ( 5'->3' uracil->thymidine)
Return [enthalpy, entropy] list in kcal/mol and cal/(mol*Kelvin)
HCRProbeDesign.utils¶
Miscellaneous utilities for sequence processing and formatting.
FastaIterator(handle)
¶
Generator function to iterate over fasta records in
buildTags(numTags, tagLength, sites=None)
¶
Generate random DNA tags with optional restriction site filtering.
| Parameters: |
|
|---|
| Returns: |
|
|---|
eprint(*args, **kwargs)
¶
Print to stderr with the same signature as print().
| Returns: |
|
|---|
estimateAffixLength(sequence, tagLength)
¶
Estimate sequence length after tag insertion.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
findUnique(tiles)
¶
Return a list of unique Tile objects from the input list.
hasRestrictionSites(sequence, sites)
¶
Check if a sequence contains restriction sites.
| Parameters: |
|
|---|
| Returns: |
|
|---|
onlyNucleic(seq, set=['a', 'c', 'g', 't', 'u', 'A', 'C', 'G', 'T', 'U', 'n', 'N', '@'])
¶
Check whether a sequence contains only nucleic characters.
| Parameters: |
|
|---|
| Returns: |
|
|---|
pp(d, level=-1, maxw=0, maxh=0, parsable=0)
¶
wrapper around pretty_print that prints to stdout
pretty_print(f, d, level=-1, maxw=0, maxh=0, gap='', first_gap='', last_gap='')
¶
Pretty-print nested structures to a file handle.
| Parameters: |
|
|---|
| Returns: |
|
|---|
warnRestrictionSites(sequence, name, sites)
¶
Print a warning if restriction sites are found in a sequence.
| Parameters: |
|
|---|
| Returns: |
|
|---|
HCRProbeDesign.BLAST¶
NCBI BLAST utilities for probe sequence validation.
blastProbes(fasta_string, species='mouse', verbose=True)
¶
Submit a BLASTN job for the given FASTA string.
| Parameters: |
|
|---|
| Returns: |
|
|---|
getNHits(blast_handle, verbose=True)
¶
Report the number of hits for each record in a BLAST response.
| Parameters: |
|
|---|
| Returns: |
|
|---|
HCRProbeDesign.repeatMask¶
RepeatMasker web API helper utilities (deprecated).
repeatmask(sequence, dnasource='mouse')
¶
This function takes a sequence and returns a masked sequence with help from RepeatMasker
| Parameters: |
|
|---|
| Returns: |
|
|---|
repeatmasker_local(sequence, dnasource='mouse')
¶
Placeholder for a local RepeatMasker wrapper.
| Parameters: |
|
|---|
| Returns: |
|
|---|
test()
¶
Simple smoke test for the remote RepeatMasker flow.