RESUMO
Numerous mapping projects conducted on different species have generated an abundance of mapping data. Consequently, many multilocus maps have been constructed using diverse mapping populations and marker sets for the same organism. The quality of maps varies broadly among populations, marker sets, and software used, necessitating efforts to integrate the mapping information and generate consensus maps. The problem of consensus genetic mapping (MCGM) is by far more challenging compared with genetic mapping based on a single dataset, which by itself is also cumbersome. The additional complications introduced by consensus analysis include inter-population differences in recombination rate and exchange distribution along chromosomes; variations in dominance of the employed markers; and use of different subsets of markers in different labs. Hence, it is necessary to handle arbitrary patterns of shared sets of markers and different level of mapping data quality. In this article, we introduce a two-phase approach for solving MCGM. In phase 1, for each dataset, multilocus ordering is performed combined with iterative jackknife resampling to evaluate the stability of marker orders. In this phase, the ordering problem is reduced to the well-known traveling salesperson problem (TSP). Namely, for each dataset, we look for order that gives minimum sum of recombination distances between adjacent markers. In phase 2, the optimal consensus order of shared markers is selected from the set of allowed orders and gives the minimal sum of total lengths of nonconflicting maps of the chromosome. This criterion may be used in different modifications to take into account the variation in quality of the original data (population size, marker quality, etc.). In the foregoing formulation, consensus mapping is considered as a specific version of TSP that can be referred to as "synchronized TSP." The conflicts detected after phase 1 are resolved using either a heuristic algorithm over the entire chromosome or an exact/heuristic algorithm applied subsequently to the revealed small non-overlapping regions with conflicts separated by non-conflicting regions. The proposed approach was tested on a wide range of simulated data and real datasets from maize.
RESUMO
The abundance of repeat elements in the maize genome complicates its assembly. Retrotransposons alone are estimated to constitute at least 50% of the genome. In this paper, we introduce a problem called retroscaffolding, which is a new variant of the well known problem of scaffolding that orders and orients a set of assembled contigs in a genome assembly project. The key feature of this new formulation is that it takes advantage of the structural characteristics and abundance of a particular type of retrotransposons called the Long Terminal Repeat (LTR) retrotransposons. This approach is not meant to supplant but rather to complement other scaffolding approaches. The advantages of retroscaffolding are twofold: (i) it allows detection of regions containing LTR retrotransposons within the unfinished portions of a genome and can therefore guide the process of finishing, and (ii) it provides a mechanism to lower sequencing coverage without impacting the quality of the final assembled genic portions. Sequencing and finishing costs dominate the expenditures in whole genome projects, and it is often desired in the interest of saving cost to reduce such efforts spent on repetitive regions of a genome. The retroscaffolding technique provides a viable mechanism to this effect. Results of preliminary studies on maize genomic data validate the utility of our approach. We also report on the on-going development of an algorithmic framework to perform retroscaffolding.
Assuntos
Biologia Computacional/métodos , Genoma , Retroelementos/genética , Sequências Repetidas Terminais , Algoritmos , Cromossomos Artificiais Bacterianos , Mapeamento de Sequências Contíguas , Genes de Plantas , Modelos Genéticos , Software , Zea mays/genéticaRESUMO
Plant-parasitic nematodes are important and cosmopolitan pathogens of crops. Here, we describe the generation and analysis of 1928 expressed sequence tags (ESTs) of a splice-leader 1 (SL1) library from mixed life stages of the root-lesion nematode Pratylenchus penetrans. The ESTs were grouped into 420 clusters and classified by function using the Gene Ontology (GO) hierarchy and the Kyoto KEGG database. Approximately 80% of all translated clusters show homology to Caenorhabditis elegans proteins, and 37% of the C. elegans gene homologs had confirmed phenotypes as assessed by RNA interference tests. Use of an SL1-PCR approach, while ensuring the cloning of the 5' ends of mRNAs, has demonstrated bias toward short transcripts. Putative nematode-specific and Pratylenchus -specific genes were identified, and their implications for nematode control strategies are discussed.