Parameters for Synteny Identification


The process of synteny identification is highly sensitive to the choice of parameters. Cinteny server allows users to adjust the parameters and update the results.

Aggregation of Synteny Blocks

Perfect synteny is rarely encountered in real datasets. Large synteny blocks are often disrupted by small regions which are out of order. The aim of synteny block identification algorithm is to ignore small disruptions. The following parameters are used in the process.

  • Minimum Length of Synteny Block: All the synteny blocks smaller than this value are ignored.
  • Maximum Gap between Adjacent Markers: If the distance between two adjacent synteny blocks is less then this value, then they are combined to form one bigger block.
  • Minimum Number of markers: Only the sytneny blocks which have at least a minimum number of markers are retained.

No aggregation sets the first two parameters to zero. Effectively, no aggregation takes place and only pure conserved regions are found.

Choice of Paralog

Degenerate markers (multiple copies of the same gene or sequence tag) pose a problem in the identification of synteny blocks and measurement of reversal distance. Cinteny allows users to follow multiple strategy for dealing with them.

  • Random: One of the paralog is chosen randomly and the remaining are discarded.
  • First Found: The paralog first found in the data set is retained and the rest are discarded. Effectively, this is the same as random but the selection is deterministic for a particular dataset.
  • Conserved: The paralog which lies in the longest conserved segment of the genome is selected and the rest are discarded.
  • Remove All: All the markers for which paralogs exist are removed.

Updating the Results

After modifying the parameters, the results may be recomputed by pressing the Update button. Users may recompute results repeatedly by changing the parameters to assess the effect of the parameters. In particualar, a typical query, involving computation of the reversal distance for a pair of mammalian genomes, takes only a couple of CPU seconds, one can easily assess the effects of various approximations and different levels of coarse-graining.