Score

  • Scores have been normalised, such that 100 is the maximum score possible.
  • The PromScan algorithm assigns a score to each potential promoter site using this formula
  • The score, Iseq, is known as the relative entropy or the Kullback-Leibler distance.
  • Since it is a log-likelihood ratio statistic, it can be used to estimate the statistical significance of the sequence pattern.
  • The best justification for using this formula is that Iseq is related to the binding energy of the protein to the site (Stormo, 2000).
  • The PromScan program ascertains the values of pb by calculating the base-composition of the input sequence.
  • Values of fb,i were calculated from the base-frequencies in 186 known binding sites compiled by Barrios et al.(1999).

Gene/ORF

  • Microbial genome sequences can be downloaded from the NCBI ftp site as .fna files in FASTA format.
  • Protein-table files (.ptt) are also available, containing information about known and predicted open reading frames (ORFs).
  • PromScan can use these .ptt files to determine whether each potential promoter lies in a coding or non-coding (intergenic) region of DNA, and which ORF it lies in or near.

Sequence

The sequence of the potential promoter is given. This is a 16 base sequence and should be similar to the consensus: yTGGCACGrnnnTTGCw.

Position

This indicates the position of the predicted promoter in the query sequence. A "+" indicates that the site is on the top strand, and a "-" indicates that it is on the bottom (i.e. the complementary) strand.

| Home | Tools | Explanation | News | Data | Software | Contact | Links |

This page created and maintained by David J. Studholme. Last updated: Sun Jun 9 09:20:08 GMT 2002