Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
BMC Bioinformatics ; 25(1): 42, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38273275

RESUMO

BACKGROUND: The clustering of immune repertoire data is challenging due to the computational cost associated with a very large number of pairwise sequence comparisons. To overcome this limitation, we developed Anchor Clustering, an unsupervised clustering method designed to identify similar sequences from millions of antigen receptor gene sequences. First, a Point Packing algorithm is used to identify a set of maximally spaced anchor sequences. Then, the genetic distance of the remaining sequences to all anchor sequences is calculated and transformed into distance vectors. Finally, distance vectors are clustered using unsupervised clustering. This process is repeated iteratively until the resulting clusters are small enough so that pairwise distance comparisons can be performed. RESULTS: Our results demonstrate that Anchor Clustering is faster than existing pairwise comparison clustering methods while providing similar clustering quality. With its flexible, memory-saving strategy, Anchor Clustering is capable of clustering millions of antigen receptor gene sequences in just a few minutes. CONCLUSIONS: This method enables the meta-analysis of immune-repertoire data from different studies and could contribute to a more comprehensive understanding of the immune repertoire data space.


Assuntos
Algoritmos , Receptores de Antígenos , Análise por Conglomerados
2.
PLoS One ; 18(7): e0288388, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37440576

RESUMO

Intrinsically disordered proteins (IDPs) are proteins that lack a stable 3D structure but maintain a biological function. It has been frequently suggested that IDPs are difficult to align because they tend to have fewer conserved residues compared to ordered proteins, but to our knowledge this has never been directly tested. To compare the alignments of ordered proteins to IDPs, their multiple sequence alignments (MSAs) were assessed using two different methods. The first compared the similarity between MSAs produced using the same sequences but created with Clustal Omega, MAFFT, and MUSCLE. The second assessed MSAs based on how well they recapitulated the species tree. These two methods measure the "correctness" of an MSA with two different approaches; the first method measures consistency while the second measures the underlying phylogenetic signal. Proteins that contained both regions of disorder and order were analyzed along with proteins that were fully disordered and fully ordered, using nucleotide, codon and peptide sequence alignments. We observed that IDPs had less similar MSAs than ordered proteins, which is most likely linked to the lower sequence conservation in IDPs. However, comparisons of tree distances found that trees from the ordered sequence MSAs were not significantly closer to the species tree than those inferred from disordered sequence MSAs. Our results show that it is correct to say that IDPs are difficult to align on the basis of MSA consistency, but that this does not equate with alignments being of poor quality when assessed by their ability to correctly infer a species tree.


Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas Intrinsicamente Desordenadas/genética , Proteínas Intrinsicamente Desordenadas/química , Filogenia , Alinhamento de Sequência
4.
Biosystems ; 202: 104352, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33503467

RESUMO

Social dilemma games are studied to gain insight into why humans cooperate with other unrelated people. The canonical game has cooperation and defection as the two strategies. Cooperation benefits the group, but a self-interested player can always do better by defecting. But if everybody defects, then the entire group loses. This tradeoff between cooperation and defection gives rise to the social dilemma. Social dilemma games need some method to evolve strategy changes between rounds. The two most widely accepted methods are a Moran process or replicator equations. Although both methods can predict how strategies evolve in a player population, no comparison of their performance has yet been made. In this paper we compare them in a public goods game which is an N-player version of prisoner's dilemma (N>2). Our results indicate only one of these methods should be used in future research efforts.


Assuntos
Comportamento Cooperativo , Teoria dos Jogos , Interação Social , Humanos , Distribuição Aleatória
5.
BMC Bioinformatics ; 20(1): 328, 2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31195955

RESUMO

BACKGROUND: Detection of central nodes in asymmetrically directed biological networks depends on centrality metrics quantifying individual nodes' importance in a network. In topological analyses on metabolic networks, various centrality metrics have been mostly applied to metabolite-centric graphs. However, centrality metrics including those not depending on high connections are largely unexplored for directed reaction-centric graphs. RESULTS: We applied directed versions of centrality metrics to directed reaction-centric graphs of microbial metabolic networks. To investigate the local role of a node, we developed a novel metric, cascade number, considering how many nodes are closed off from information flow when a particular node is removed. High modularity and scale-freeness were found in the directed reaction-centric graphs and betweenness centrality tended to belong to densely connected modules. Cascade number and bridging centrality identified cascade subnetworks controlling local information flow and irreplaceable bridging nodes between functional modules, respectively. Reactions highly ranked with bridging centrality and cascade number tended to be essential, compared to reactions that other central metrics detected. CONCLUSIONS: We demonstrate that cascade number and bridging centrality are useful to identify key reactions controlling local information flow in directed reaction-centric graphs of microbial metabolic networks. Knowledge about the local flow connectivity and connections between local modules will contribute to understand how metabolic pathways are assembled.


Assuntos
Bactérias/metabolismo , Redes e Vias Metabólicas , Escherichia coli/metabolismo
6.
J Theor Biol ; 472: 36-45, 2019 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-30954506

RESUMO

There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of "rogue taxa", i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design.


Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Análise por Conglomerados , Gênero Iris/classificação
7.
PLoS One ; 14(2): e0211813, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30726271

RESUMO

Dehydrins, plant proteins that are upregulated during dehydration stress conditions, have modular sequences that can contain three conserved motifs (the Y-, S-, and K-segments). The presence and order of these motifs are used to classify dehydrins into one of five architectures: Kn, SKn, KnS, YnKn, and YnSKn, where the subscript n describes the number of copies of that motif. In this study, an architectural and phylogenetic analysis was performed on 426 dehydrin sequences that were identified in 53 angiosperm and 3 gymnosperm genomes. It was found that angiosperms contained all five architectures, while gymnosperms only contained Kn and SKn dehydrins. This suggests that the ancestral dehydrin in spermatophytes was either Kn or SKn, and the Y-segment containing dehydrins first arose in angiosperms. A high-level split between the YnSKn dehydrins from either the Kn or SKn dehydrins could not be confidently identified, however, two lower level architectural divisions appear to have occurred after different duplication events. The first likely occurred after a whole genome duplication, resulting in the duplication of a Y3SK2 dehydrin; the duplicate subsequently lost an S- and K- segment to become a Y3K1 dehydrin. The second split occurred after a tandem duplication of a Y1SK2 dehydrin, where the duplicate lost both the Y- and S- segment and gained four K-segments, resulting in a K6 dehydrin. We suggest that the newly arisen Y3K1 dehydrin is possibly on its way to pseudogenization, while the newly arisen K6 dehydrin developed a novel function in cold protection.


Assuntos
Cycadopsida/genética , Evolução Molecular , Duplicação Gênica , Genoma de Planta , Magnoliopsida/genética , Filogenia , Proteínas de Plantas/genética , Bases de Dados de Proteínas
8.
Protoplasma ; 255(6): 1855-1876, 2018 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-29774409

RESUMO

Starch is a water-insoluble polyglucan synthesized inside the plastid stroma within plant cells, serving a crucial role in the carbon budget of the whole plant by acting as a short-term and long-term store of energy. The highly complex, hierarchical structure of the starch granule arises from the actions of a large suite of enzyme activities, in addition to physicochemical self-assembly mechanisms. This review outlines current knowledge of the starch biosynthetic pathway operating in plant cells in relation to the micro- and macro-structures of the starch granule. We highlight the gaps in our knowledge, in particular, the relationship between enzyme function and operation at the molecular level and the formation of the final, macroscopic architecture of the granule.


Assuntos
Plantas/metabolismo , Plastídeos/metabolismo , Amido/metabolismo , Modelos Biológicos , Fosforilação , Amido/biossíntese , Amido/química
9.
Biosystems ; 162: 205-214, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29097246

RESUMO

Graphs can be used as contact networks in models of epidemic spread. Most research seeks to extract the properties of an extant graph, derived from questionnaires or other sources of contact information. The inverse problem of searching the space of graphs for those that exhibit specific properties has received little attention and that is the focus of this study. This is, in part, because searching the space of contact networks is difficult. This paper extends and tests a representation for searching the space of contact networks with evolutionary computation. The focus of this study is on improvements in the representation used to evolve potential contact networks, adding an operator that permits strictly local adjustments to connectivity of the network, and another that does nothing at all. The benefits of doing nothing at some points during the construction of a network are substantial, because this permits evolution to adjust the number of active commands issued automatically. Adjusting local connectivity was identified as a beneficial feature in earlier research. The network induction method is tested on two tasks; finding a network that sustains an epidemic as long as possible and finding a network that, under simulation, closely matches a specified pattern of rise and fall in the number of infections.


Assuntos
Algoritmos , Gráficos por Computador , Epidemias , Infecções/epidemiologia , Simulação por Computador , Humanos , Infecções/diagnóstico , Modelos Biológicos , Fatores de Tempo
10.
Biosystems ; 150: 35-45, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27521768

RESUMO

DNA Fragment assembly - an NP-Hard problem - is one of the major steps in of DNA sequencing. Multiple strategies have been used for this problem, including greedy graph-based algorithms, deBruijn graphs, and the overlap-layout-consensus approach. This study focuses on the overlap-layout-consensus approach. Heuristics and computational intelligence methods are combined to exploit their respective benefits. These algorithm combinations were able to produce high quality results surpassing the best results obtained by a number of competitive algorithms specially designed and tuned for this problem on thirteen of sixteen popular benchmarks. This work also reinforces the necessity of using multiple search strategies as it is clearly observed that algorithm performance is dependent on problem instance; without a deeper look into many searches, top solutions could be missed entirely.


Assuntos
Algoritmos , Inteligência Artificial , Fragmentação do DNA , DNA/genética , Análise de Sequência de DNA/métodos , Animais , Humanos
11.
Biosystems ; 114(3): 178-85, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24051263

RESUMO

This paper examines the use of evolutionary algorithms in the development of antibiotic regimens given to production animals. A model is constructed that combines the lifespan of the animal and the bacteria living in the animal's gastro-intestinal tract from the early finishing stage until the animal reaches market weight. This model is used as the fitness evaluation for a set of graph based evolutionary algorithms to assess the impact of diversity control on the evolving antibiotic regimens. The graph based evolutionary algorithms have two objectives: to find an antibiotic treatment regimen that maintains the weight gain and health benefits of antibiotic use and to reduce the risk of spreading antibiotic resistant bacteria. This study examines different regimens of tylosin phosphate use on bacteria populations divided into Gram positive and Gram negative types, with a focus on Campylobacter spp. Treatment regimens were found that provided decreased antibiotic resistance relative to conventional methods while providing nearly the same benefits as conventional antibiotic regimes. By using a graph to control the information flow in the evolutionary algorithm, a variety of solutions along the Pareto front can be found automatically for this and other multi-objective problems.


Assuntos
Algoritmos , Doenças dos Animais/prevenção & controle , Criação de Animais Domésticos/métodos , Antibacterianos/uso terapêutico , Infecções Bacterianas/veterinária , Gado/crescimento & desenvolvimento , Modelos Teóricos , Doenças dos Animais/microbiologia , Animais , Infecções Bacterianas/prevenção & controle , Biologia Computacional/métodos , Gado/microbiologia , Tilosina
12.
Biosystems ; 113(1): 9-27, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23603215

RESUMO

The explosion of available sequence data necessitates the development of sophisticated machine learning tools with which to analyze them. This study introduces a sequence-learning technology called side effect machines. It also applies a model of evolution which simulates the evolution of a ring species to the training of the side effect machines. A comparison is done between side effect machines evolved in the ring structure and side effect machines evolved using a standard evolutionary algorithm based on tournament selection. At the core of the training of side effect machines is a nearest neighbor classifier. A parameter study was performed to investigate the impact of the division of training data into examples for nearest neighbor assessment and training cases. The parameter study demonstrates that parameter setting is important in the baseline runs but had little impact in the ring-optimization runs. The ring optimization technique was also found to exhibit improved and also more reliable training performance. Side effect machines are tested on two types of synthetic data, one based on GC-content and the other checking for the ability of side effect machines to recognize an embedded motif. Three types of biological data are used, a data set with different types of immune-system genes, a data set with normal and retro-virally derived human genomic sequence, and standard and nonstandard initiation regions from the cytochrome-oxidase subunit one in the mitochondrial genome.


Assuntos
Algoritmos , Inteligência Artificial , DNA/genética , Modelos Genéticos , Animais , Composição de Bases/genética , Sequência de Bases , Biologia Computacional/métodos , DNA/química , DNA/classificação , Evolução Molecular , Genoma Mitocondrial/genética , Humanos , Complexo Principal de Histocompatibilidade/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos
13.
BMC Plant Biol ; 13: 42, 2013 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-23497159

RESUMO

BACKGROUND: The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. RESULTS: A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. CONCLUSIONS: An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.


Assuntos
Antocianinas/biossíntese , Vias Biossintéticas , Biologia Computacional/métodos , Flavonoides/biossíntese , Proteínas de Plantas/genética , Regiões Promotoras Genéticas , Software , Zea mays/genética , Algoritmos , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/metabolismo , Sequência de Bases , Biologia Computacional/instrumentação , Dados de Sequência Molecular , Proteínas de Plantas/metabolismo , Zea mays/crescimento & desenvolvimento , Zea mays/metabolismo
14.
Biosystems ; 110(1): 1-8, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22771982

RESUMO

DNA error correcting codes over the edit metric consist of embeddable markers for sequencing projects that are tolerant of sequencing errors. When a genetic library has multiple sources for its sequences, use of embedded markers permit tracking of sequence origin. This study compares different methods for synthesizing DNA error correcting codes. A new code-finding technique called the salmon algorithm is introduced and used to improve the size of best known codes in five difficult cases of the problem, including the most studied case: length six, distance three codes. An updated table of the best known code sizes with 36 improved values, resulting from three different algorithms, is presented. Mathematical background results for the problem from multiple sources are summarized. A discussion of practical details that arise in application, including biological design and decoding, is also given in this study.


Assuntos
Algoritmos , DNA , Biologia Computacional , Reparo do DNA , Replicação do DNA , Biblioteca Gênica
15.
J Theor Biol ; 264(4): 1202-13, 2010 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-20298702

RESUMO

Ring species are a biological complex that theoretically forms when an ancestral population extends its range around a geographic barrier and, despite low-level gene flow, differentiates until reproductive isolation exists when terminal populations come into secondary contact. Due to their rarity in nature, little is known about the biological factors that promote the formation of ring species. We use evolutionary algorithms operating on two simple computational problems (SAW and K-max) to study the process of speciation under the conditions which may yield ring species. We vary evolutionary parameters to measure their influence on ring species' development and stability over evolutionary time. Using the SAW problem, ring species consistently form, i.e. fertility is negatively correlated with distance (R-values between -0.097 and -0.821, p<0.001), and terminal populations show substantial infertility. However, all SAW simulations demonstrate instability in the complex after sympatric zones are established between terminal populations. Higher mutation rates and larger dispersal/breeding radii promote ring species' formation and stability. Using a problem with a simple fitness landscape, the K-max problem, ring species do not form. Instead, speciation around the ring occurs before ring closure as good genotypes become locally dominant.


Assuntos
Genética Populacional , Modelos Genéticos , Reprodução , Algoritmos , Animais , Evolução Biológica , Simulação por Computador , Fertilidade , Genótipo , Geografia
16.
BMC Bioinformatics ; 10: 260, 2009 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-19698124

RESUMO

BACKGROUND: Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. RESULTS: We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. CONCLUSION: The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Bases de Dados Genéticas , Reconhecimento Automatizado de Padrão
17.
J Genet Genomics ; 35(10): 603-16, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18937917

RESUMO

The maize (Zea mays) spikelet consists of two florets, each of which contains three developmentally synchronized anthers. Morphologically, the anthers in the upper and lower florets proceed through apparently similar developmental programs. To test for global differences in gene expression and to identify genes that are coordinately regulated during maize anther development, RNA samples isolated from upper and lower floret anthers at six developmental stages were hybridized to cDNA microarrays. Approximately 9% of the tested genes exhibited statistically significant differences in expression between anthers in the upper and lower florets. This finding indicates that several basic biological processes are differentially regulated between upper and lower floret anthers, including metabolism, protein synthesis and signal transduction. Genes that are coordinately regulated across anther development were identified via cluster analysis. Analysis of these results identified stage-specific, early in development, late in development and bi-phasic expression profiles. Quantitative RT-PCR analysis revealed that four genes whose homologs in other plant species are involved in programmed cell death are up-regulated just prior to the time the tapetum begins to visibly degenerate (i.e., the mid-microspore stage). This finding supports the hypothesis that developmentally normal tapetal degeneration occurs via programmed cell death.


Assuntos
Apoptose , Flores/citologia , Flores/genética , Regulação da Expressão Gênica de Plantas , Zea mays/citologia , Zea mays/genética , Análise por Conglomerados , Flores/crescimento & desenvolvimento , Flores/metabolismo , Perfilação da Expressão Gênica , Genes de Plantas/genética , Análise de Sequência com Séries de Oligonucleotídeos , Proteínas de Plantas/biossíntese , Reprodutibilidade dos Testes , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Fatores de Tempo , Regulação para Cima , Zea mays/crescimento & desenvolvimento , Zea mays/metabolismo
18.
Genetics ; 175(1): 429-39, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17110490

RESUMO

As an ancient segmental tetraploid, the maize (Zea mays L.) genome contains large numbers of paralogs that are expected to have diverged by a minimum of 10% over time. Nearly identical paralogs (NIPs) are defined as paralogous genes that exhibit > or = 98% identity. Sequence analyses of the "gene space" of the maize inbred line B73 genome, coupled with wet lab validation, have revealed that, conservatively, at least approximately 1% of maize genes have a NIP, a rate substantially higher than that in Arabidopsis. In most instances, both members of maize NIP pairs are expressed and are therefore at least potentially functional. Of evolutionary significance, members of many NIP families also exhibit differential expression. The finding that some families of maize NIPs are closely linked genetically while others are genetically unlinked is consistent with multiple modes of origin. NIPs provide a mechanism for the maize genome to circumvent the inherent limitation that diploid genomes can carry at most two "alleles" per "locus." As such, NIPs may have played important roles during the evolution and domestication of maize and may contribute to the success of long-term selection experiments in this important crop species.


Assuntos
Evolução Molecular , Genoma de Planta , Proteínas de Plantas/genética , Zea mays/genética , Arabidopsis/genética , Sequência de Bases , DNA de Plantas/química , DNA de Plantas/genética , Dados de Sequência Molecular , Seleção Genética , Homologia de Sequência do Ácido Nucleico
19.
Genetics ; 174(3): 1671-83, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16951074

RESUMO

A new genetic map of maize, ISU-IBM Map4, that integrates 2029 existing markers with 1329 new indel polymorphism (IDP) markers has been developed using intermated recombinant inbred lines (IRILs) from the intermated B73xMo17 (IBM) population. The website http://magi.plantgenomics.iastate.edu provides access to IDP primer sequences, sequences from which IDP primers were designed, optimized marker-specific PCR conditions, and polymorphism data for all IDP markers. This new gene-based genetic map will facilitate a wide variety of genetic and genomic research projects, including map-based genome sequencing and gene cloning. The mosaic structures of the genomes of 91 IRILs, an important resource for identifying and mapping QTL and eQTL, were defined. Analyses of segregation data associated with markers genotyped in three B73/Mo17-derived mapping populations (F2, Syn5, and IBM) demonstrate that allele frequencies were significantly altered during the development of the IBM IRILs. The observations that two segregation distortion regions overlap with maize flowering-time QTL suggest that the altered allele frequencies were a consequence of inadvertent selection. Detection of two-locus gamete disequilibrium provides another means to extract functional genomic data from well-characterized plant RILs.


Assuntos
Mapeamento Cromossômico , Cruzamentos Genéticos , Genes de Plantas , Recombinação Genética , Zea mays/genética , Alelos , Sequência de Bases , Cromossomos de Plantas , Etiquetas de Sequências Expressas , Frequência do Gene , Marcadores Genéticos , Dados de Sequência Molecular , Polimorfismo Genético , Locos de Características Quantitativas
20.
Proc Natl Acad Sci U S A ; 102(34): 12282-7, 2005 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-16103354

RESUMO

Recent sequencing efforts have targeted the gene-rich regions of the maize (Zea mays L.) genome. We report the release of an improved assembly of maize assembled genomic islands (MAGIs). The 114,173 resulting contigs have been subjected to computational and physical quality assessments. Comparisons to the sequences of maize bacterial artificial chromosomes suggest that at least 97% (160 of 165) of MAGIs are correctly assembled. Because the rates at which junction-testing PCR primers for genomic survey sequences (90-92%) amplify genomic DNA are not significantly different from those of control primers ( approximately 91%), we conclude that a very high percentage of genic MAGIs accurately reflect the structure of the maize genome. EST alignments, ab initio gene prediction, and sequence similarity searches of the MAGIs are available at the Iowa State University MAGI web site. This assembly contains 46,688 ab initio predicted genes. The expression of almost half (628 of 1,369) of a sample of the predicted genes that lack expression evidence was validated by RT-PCR. Our analyses suggest that the maize genome contains between approximately 33,000 and approximately 54,000 expressed genes. Approximately 5% (32 of 628) of the maize transcripts discovered do not have detectable paralogs among maize ESTs or detectable homologs from other species in the GenBank NR nucleotide/protein database. Analyses therefore suggest that this assembly of the maize genome contains approximately 350 previously uncharacterized expressed genes. We hypothesize that these "orphans" evolved quickly during maize evolution and/or domestication.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genes de Plantas/genética , Genoma de Planta , Ilhas Genômicas/genética , Genômica/métodos , Zea mays/genética , Cromossomos Artificiais Bacterianos , Biologia Computacional , Primers do DNA , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA