Spliced Alignment Approach to Solve Exon Chaining Problem in Gene Prediction

Penulis: Tigor Nauli
Many organisms contain not only genes but also large amounts of so-called junk DNA that does not code for proteins at all. In particular, most eukaryotes genes are broken into pieces called exons that are separated by this junk DNA. Gene predicting systems do not work by fixed deterministic rules because the jumps between different parts of split genes are inconsistent from species to species. The Exon Chaining problem is a combinatorial puzzle: given a known target protein and a genomic sequence, find a set of candidate exons of the genomic sequence whose concatenation (splicing) best fits the target. In the spliced alignment approach, we can use the alignment with the target protein to distinguish the true exons from the false exons. Then, we explore all possible chains of the candidate exons to find the chain with the high similarity score to the target protein. Dynamic programming is applied to implement the spliced alignment in searching genes in mus musculus (home mouse) genomic sequence. The candidate exons are constructed from all fragments between potential acceptor sites represented by AG and potential donor sites represented by GT. The optimum alignment has found seventeen genes in the genome, which are more than thirteen of possible genes. The incorrect predictions are possibly associated with an unusual large number of chains of very short potential exons.

Proceeding of the International Conference on Mathematics and Sciences (ICOMSc) 2011

ISSN / ISBN / IBSN : 978-602-19142-0-5

No. Arsip : LIPI-11058