Score
- Scores have been normalised, such that 100 is the maximum score possible.
- The PromScan algorithm assigns a score to each potential promoter site using
this formula
- The score, Iseq, is known as the relative entropy or the
Kullback-Leibler distance.
- Since it is a log-likelihood ratio statistic, it can be used to estimate
the statistical significance of the sequence pattern.
- The best justification for using this formula
is that Iseq is related to the binding energy of the protein to the site
(Stormo, 2000).
- The PromScan
program ascertains the values of pb
by calculating the base-composition of the input sequence.
- Values of
fb,i were calculated from the base-frequencies in 186
known binding sites compiled by
Barrios et al.(1999).
- Microbial genome sequences can be downloaded from the NCBI ftp site as .fna files in FASTA format.
- Protein-table files (.ptt) are also available, containing information about known and predicted open reading frames (ORFs).
- PromScan can use these .ptt files to determine whether each potential promoter lies in a coding or non-coding
(intergenic) region of DNA, and which ORF it lies in or near.
|
|