Newswise — Scientists from The Federal Research Centre “Fundamentals of Biotechnology” of the Russian Academy of Sciences (Research Center of Biotechnology RAS) elaborated mathematical algorithm that enabled to find dispersed repeated elements in genome with great accuracy. Authors tested this approach on genetic sequences of nine kinds of bacteria, and discovered early unknown repeats in all of them. Thus, for example, it turned out that almost 50% of genome of E. coli is presented by quite long repeats (400-600 pairs of nucleotides long). Such repeats represent a definite code, that is placed upon existing genes of bacteria over coding amino-acids. The found dispersed repeats can help to find new genetic targets, that are interesting from the point of view of biotechnology, for example, parts of DNA, impact on which will enable to increase productivity of bacterial strains. Results of the research are published in International Journal of Molecular Sciences.

In genomes of many eucaryotic (multicellular) organisms – from yeast to human – there are repeated sequences of nucleotides that are a kind of letters, that compose DNA. Each such repeat is several hundreds of nucleotides long and they are spread all over the whole genome. In sum they form a family that can have significant number of separate copies. The amount of such families, and also position and number of repeats in each family differ in various species and so they can tell about evolution and origin of different living organisms. There are many mathematical algorithms for searching of dispersed repeats (those ones that are more or less equally spread in a genome), such algorithms that even enable to find out “corrupted” copies, those repeats, where some mutations took place and the sequences of which are different from others. However, in the process of evolution such changes can be so numerous, that it becomes impossible to find in genome two insufficiently similar sequences. In this connection scientists search new approaches for searching of dispersed repeats, spread in genomes of various organisms. It is significant to note that such families of repeats were earlier found only in genomes of eukaryotes (multicellular) organisms, whereas they were unknown in organisms of bacteria.

Scientists from the Federal Research Centre “Fundamentals of Biotechnology” of the Russian Academy of Sciences (Research Center of Biotechnology RAS) suggested a new method of searching of dispersed repeated sequences. Principle of its work can be compared with search of mathematic matrix, consisting of columns and lines, that describes the family of repeats in the best way. The suggested approach is optimal as far as accuracy of finding “dispersed” repeats in the whole genome is concerned, because it takes into account the ability of changing nucleotides and their insertions and deletions, in other words, mutations.

Researchers tested this algorithm on artificially generated sequences, that contained thousands of repeats each, a part of which contained mutations. A comparison with widely used in bioinformatics search systems showed that the suggested method enabled to find out repeats of one family with a greater number of mutations between them (up to the change of half of the nucleotides in a sequences) more precisely.

Then authors of the research applied algorithm for search of repeats in genome of nine kinds of bacteria: Escherichia coli, Bacillus subtilis, Azotobacter vinelandii, Clostridium tetani, Methylococcus capsulatus, Mycobacterium tuberculosis, Shigella sonnei, Treponema pallidum and Yersinia pestis. Analysis enabled scientists for the first time to find out three families of repeats, 400-600 pairs of nucleotides long, in Escherichia coli, which in total take almost 50% of the whole genome of bacterium. Earlier in this microorganism there were known similar elements only of less length – up to 300 pairs of nucleotides – and in smaller numbers. In genetic sequences in other bacteria, they managed to find 1-2 families of repeats of the same length (400-600 nucleotides). By this less of them were found in Treponema pallidum, that can be connected with small size of a genome of this microorganism.

“The found families of dispersed repeats are discovered in genes, and they represent a definite code placed on genes over triplet code, that provides coding of amino acid sequences by genes. By this it is not important on which DNA strand genes are situated. The obtained code can serve as a base for folding DNA in so called nucleoid, that in most degree defines expression of bacterial genes. It can be said that in bacterial DNA there is a code providing its folding into a nucleoid, and now we have obtained an ability to manage it. It opens great opportunities for creating new microorganisms, useful for people”, – tells about results of the research Eugine Korotkov, Doctor of Biological Sciences, head of the group of mathematical analysis of DNA sequences and proteins the Federal Research Centre “Fundamentals of Biotechnology” of the Russian Academy of Sciences (Research Center of Biotechnology RAS).

The suggested approach can be used for analysis not only of bacterial genomes, but also genetic sequences of multicellular organisms, for example, animals or plants. It can help to understand evolution of genomes and their separate elements better, and also in the case of bacteria to find targets for creating new antibiotics or increasing productivity of strains that are important for biotechnology.

MEDIA CONTACT
Register for reporter access to contact details
CITATIONS

International Journal of Molecular Sciences