Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Virology ; 464-465: 406-414, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25128762

RESUMO

To deepen our understanding of early rectal transmission of HIV-1, we studied virus-host interactions in the rectal mucosa using simian immunodeficiency virus (SIV)-Indian rhesus macaque model and mRNA deep sequencing. We found that rectal mucosa actively responded to SIV as early as 3 days post-rectal inoculation (dpi) and mobilized more robust responses at 6 and 10 dpi. Our results suggest that the failure of the host to contain virus replication at the portal of entry is attributable to both a high-level expression of lymphocyte chemoattractant, proinflammatory and immune activation genes, which can recruit and activate viral susceptible target cells into mucosa; and a high-level expression of SIV accessory genes, which are known to be able to counter and evade host restriction factors and innate immune responses. This study provides new insights into the mechanism of rectal transmission.


Assuntos
Infecções por HIV/transmissão , Mucosa Intestinal/virologia , Reto/virologia , Síndrome de Imunodeficiência Adquirida dos Símios/transmissão , Vírus da Imunodeficiência Símia/fisiologia , Animais , Citocinas/genética , Citocinas/imunologia , Modelos Animais de Doenças , Infecções por HIV/genética , Infecções por HIV/imunologia , Infecções por HIV/virologia , HIV-1/genética , HIV-1/fisiologia , Interações Hospedeiro-Patógeno , Humanos , Macaca mulatta , Masculino , Síndrome de Imunodeficiência Adquirida dos Símios/genética , Síndrome de Imunodeficiência Adquirida dos Símios/imunologia , Síndrome de Imunodeficiência Adquirida dos Símios/virologia , Vírus da Imunodeficiência Símia/genética
2.
PLoS One ; 8(4): e59537, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23560051

RESUMO

BACKGROUND: Angiotensin-converting enzyme (ACE) (EC 4.15.1) metabolizes many biologically active peptides and plays a key role in blood pressure regulation and vascular remodeling. Elevated ACE levels are associated with different cardiovascular and respiratory diseases. METHODS AND RESULTS: Two Belgian families with a 8-16-fold increase in blood ACE level were incidentally identified. A novel heterozygous splice site mutation of intron 25 - IVS25+1G>A (c.3691+1G>A) - cosegregating with elevated plasma ACE was identified in both pedigrees. Messenger RNA analysis revealed that the mutation led to the retention of intron 25 and Premature Termination Codon generation. Subjects harboring the mutation were mostly normotensive, had no left ventricular hypertrophy or cardiovascular disease. The levels of renin-angiotensin-aldosterone system components in the mutated cases and wild-type controls were similar, both at baseline and after 50 mg captopril. Compared with non-affected members, quantification of ACE surface expression and shedding using flow cytometry assay of dendritic cells derived from peripheral blood monocytes of affected members, demonstrated a 50% decrease and 3-fold increase, respectively. Together with a dramatic increase in circulating ACE levels, these findings argue in favor of deletion of transmembrane anchor, leading to direct secretion of ACE out of cells. CONCLUSIONS: We describe a novel mutation of the ACE gene associated with a major familial elevation of circulating ACE, without evidence of activation of the renin-angiotensin system, target organ damage or cardiovascular complications. These data are consistent with the hypothesis that membrane-bound ACE, rather than circulating ACE, is responsible for Angiotensin II generation and its cardiovascular consequences.


Assuntos
Doenças Assintomáticas , Sequência de Bases , Células Dendríticas/metabolismo , Peptidil Dipeptidase A/genética , Sistema Renina-Angiotensina/genética , Deleção de Sequência , Adolescente , Adulto , Idoso , Inibidores da Enzima Conversora de Angiotensina/farmacologia , Pressão Sanguínea/efeitos dos fármacos , Captopril/farmacologia , Células Dendríticas/citologia , Células Dendríticas/efeitos dos fármacos , Feminino , Expressão Gênica , Heterozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Dados de Sequência Molecular , Linhagem , Peptidil Dipeptidase A/sangue , Sistema Renina-Angiotensina/efeitos dos fármacos
3.
PLoS One ; 7(11): e50147, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23226242

RESUMO

BACKGROUND: Bovine tuberculosis (bTB) is an enduring contagious disease of cattle that has caused substantial losses to the global livestock industry. Despite large-scale eradication efforts, bTB continues to persist. Current bTB tests rely on the measurement of immune responses in vivo (skin tests), and in vitro (bovine interferon-γ release assay). Recent developments are characterized by interrogating the expression of an increasing number of genes that participate in the immune response. Currently used assays have the disadvantages of limited sensitivity and specificity, which may lead to incomplete eradication of bTB. Moreover, bTB that reemerges from wild disease reservoirs requires early and reliable diagnostics to prevent further spread. In this work, we use high-throughput sequencing of the peripheral blood mononuclear cells (PBMCs) transcriptome to identify an extensive panel of genes that participate in the immune response. We also investigate the possibility of developing a reliable bTB classification framework based on RNA-Seq reads. METHODOLOGY/PRINCIPAL FINDINGS: Pooled PBMC mRNA samples from unaffected calves as well as from those with disease progression of 1 and 2 months were sequenced using the Illumina Genome Analyzer II. More than 90 million reads were splice-aligned against the reference genome, and deposited to the database for further expression analysis and visualization. Using this database, we identified 2,312 genes that were differentially expressed in response to bTB infection (p<10(-8)). We achieved a bTB infected status classification accuracy of more than 99% with split-sample validation on newly designed and learned mixtures of expression profiles. CONCLUSIONS/SIGNIFICANCE: We demonstrated that bTB can be accurately diagnosed at the early stages of disease progression based on RNA-Seq high-throughput sequencing. The inclusion of multiple genes in the diagnostic panel, combined with the superior sensitivity and broader dynamic range of RNA-Seq, has the potential to improve the accuracy of bTB diagnostics. The computational pipeline used for the project is available from http://code.google.com/p/bovine-tb-prediction.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Leucócitos Mononucleares/metabolismo , Mycobacterium bovis/imunologia , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos , Tuberculose Bovina/diagnóstico , Tuberculose Bovina/genética , Animais , Bovinos , Perfilação da Expressão Gênica , Leucócitos Mononucleares/citologia , Leucócitos Mononucleares/microbiologia , Masculino , RNA Mensageiro/imunologia , Sensibilidade e Especificidade , Transcriptoma , Tuberculose Bovina/imunologia , Tuberculose Bovina/microbiologia
4.
Bioinformatics ; 28(21): 2797-803, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-22954626

RESUMO

MOTIVATION: Microsatellites are among the most useful genetic markers in population biology. High-throughput sequencing of microsatellite-enriched libraries dramatically expedites the traditional process of screening recombinant libraries for microsatellite markers. However, sorting through millions of reads to distill high-quality polymorphic markers requires special algorithms tailored to tolerate sequencing errors in locus reconstruction, distinguish paralogous loci, rarify raw reads originating from the same amplicon and sort out various artificial fragments resulting from recombination or concatenation of auxiliary adapters. Existing programs warrant improvement. RESULTS: We describe a microsatellite prediction framework named HighSSR for microsatellite genotyping based on high-throughput sequencing. We demonstrate the utility of HighSSR in comparison to Roche gsAssembler on two Roche 454 GS FLX runs. The majority of the HighSSR-assembled loci were reliably mapped against model organism reference genomes. HighSSR demultiplexes pooled libraries, assesses locus polymorphism and implements Primer3 for the design of PCR primers flanking polymorphic microsatellite loci. As sequencing costs drop and permit the analysis of all project samples on next-generation platforms, this framework can also be used for direct simple sequence repeats genotyping. AVAILABILITY: http://code.google.com/p/highssr/


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Borboletas/classificação , Borboletas/genética , Fungos/classificação , Fungos/genética , Biblioteca Gênica , Marcadores Genéticos , Genoma , Repetições de Microssatélites/genética , Família Multigênica , Plantas/genética , Polimorfismo Genético , Especificidade da Espécie
5.
Genetics ; 190(1): 113-27, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22021387

RESUMO

Genetic nonself recognition systems such as vegetative incompatibility operate in many filamentous fungi to regulate hyphal fusion between genetically dissimilar individuals and to restrict the spread of virulence-attenuating mycoviruses that have potential for biological control of pathogenic fungi. We report here the use of a comparative genomics approach to identify seven candidate polymorphic genes associated with four vegetative incompatibility (vic) loci of the chestnut blight fungus Cryphonectria parasitica. Disruption of candidate alleles in one of two strains that were heteroallelic at vic2, vic6, or vic7 resulted in enhanced virus transmission, but did not prevent barrage formation associated with mycelial incompatibility. Detailed characterization of the vic6 locus revealed the involvement of nonallelic interactions between two tightly linked genes in barrage formation, heterokaryon formation, and asymmetric, gene-specific influences on virus transmission. The combined results establish molecular identities of genes associated with four C. parasitica vic loci and provide insights into how these recognition factors interact to trigger incompatibility and restrict virus transmission.


Assuntos
Ascomicetos/genética , Ascomicetos/virologia , Proteínas Fúngicas/genética , Alelos , Epistasia Genética , Proteínas Fúngicas/metabolismo , Loci Gênicos , Genótipo , Técnicas de Genotipagem , Modelos Biológicos , Dados de Sequência Molecular , Mutação , Polimorfismo Genético
6.
PLoS One ; 6(7): e22573, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21818341

RESUMO

BACKGROUND: The fat body is the main organ of intermediary metabolism in insects and the principal source of hemolymph proteins. As part of our ongoing efforts to understand mosquito fat body physiology and to identify novel targets for insect control, we have conducted a transcriptome analysis of the fat body of Aedes aegypti before and in response to blood feeding. RESULTS: We created two fat body non-normalized EST libraries, one from mosquito fat bodies non-blood fed (NBF) and another from mosquitoes 24 hrs post-blood meal (PBM). 454 pyrosequencing of the non-normalized libraries resulted in 204,578 useable reads from the NBF sample and 323,474 useable reads from the PBM sample. Alignment of reads to the existing reference Ae. aegypti transcript libraries for analysis of differential expression between NBF and PBM samples revealed 116,912 and 115,051 matches, respectively. De novo assembly of the reads from the NBF sample resulted in 15,456 contigs, and assembly of the reads from the PBM sample resulted in 15,010 contigs. Collectively, 123 novel transcripts were identified within these contigs. Prominently expressed transcripts in the NBF fat body library were represented by transcripts encoding ribosomal proteins. Thirty-five point four percent of all reads in the PBM library were represented by transcripts that encode yolk proteins. The most highly expressed were transcripts encoding members of the cathepsin b, vitellogenin, vitellogenic carboxypeptidase, and vitelline membrane protein families. CONCLUSION: The two fat body transcriptomes were considerably different from each other in terms of transcript expression in terms of abundances of transcripts and genes expressed. They reflect the physiological shift of the pre-feeding fat body from a resting state to vitellogenic gene expression after feeding.


Assuntos
Aedes/genética , Corpo Adiposo/metabolismo , Comportamento Alimentar , Transcriptoma , Febre Amarela/parasitologia , Animais , Galinhas , Mapeamento de Sequências Contíguas , DNA Complementar/genética , Regulação da Expressão Gênica , Genes de Insetos/genética , Imunidade/genética , Proteínas de Insetos/genética , Proteínas de Insetos/metabolismo , Proteínas de Membrana Transportadoras/genética , Proteínas de Membrana Transportadoras/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de DNA
7.
Nucleic Acids Res ; 39(16): 7077-91, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21609956

RESUMO

GC 5' splice sites (5'ss) are present in ∼1% of human introns, but factors promoting their efficient selection are poorly understood. Here, we describe a case of X-linked agammaglobulinemia resulting from a GC 5'ss activated by a mutation in BTK intron 3. This GC 5'ss was intrinsically weak, yet it was selected in >90% primary transcripts in the presence of a strong and intact natural GT counterpart. We show that efficient selection of this GC 5'ss required a high density of GAA/CAA-containing splicing enhancers in the exonized segment and was promoted by SR proteins 9G8, Tra2ß and SC35. The GC 5'ss was efficiently inhibited by splice-switching oligonucleotides targeting either the GC 5'ss itself or the enhancer. Comprehensive analysis of natural GC-AG introns and previously reported pathogenic GC 5'ss showed that their efficient activation was facilitated by higher densities of splicing enhancers and lower densities of silencers than their GT 5'ss equivalents. Removal of the GC-AG introns was promoted to a minor extent by the splice-site strength of adjacent exons and inhibited by flanking Alu repeats, with the first downstream Alus located on average at a longer distance from the GC 5'ss than other transposable elements. These results provide new insights into the splicing code that governs selection of noncanonical splice sites.


Assuntos
Agamaglobulinemia/genética , Doenças Genéticas Ligadas ao Cromossomo X/genética , Sítios de Splice de RNA , Tirosina Quinase da Agamaglobulinemia , Linhagem Celular , Humanos , Sequências Repetitivas Dispersas , Íntrons , Oligonucleotídeos Antissenso , Mutação Puntual , Proteínas Tirosina Quinases/genética , Splicing de RNA , Sequências Reguladoras de Ácido Ribonucleico
8.
BMC Bioinformatics ; 11: 22, 2010 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-20067640

RESUMO

BACKGROUND: Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We have developed a new tool (SpliceScan II) for predicting the effects of genetic variants on splicing and cis-regulatory elements. The novel Bayesian non-canonical 5'GC splice site (SS) sensor used in our tool allows inference on non-canonical exons. RESULTS: Our tool performed favorably when compared with the existing methods in the context of genes linked to the Autism Spectrum Disorder (ASD). SpliceScan II was able to predict more aberrant splicing isoforms triggered by the mutations, as documented in DBASS5 and DBASS3 aberrant splicing databases, than other existing methods. Detrimental effects behind some of the polymorphic variations previously associated with Alzheimer's and breast cancer could be explained by changes in predicted splicing patterns. CONCLUSIONS: We have developed SpliceScan II, an effective and sensitive tool for predicting the detrimental effects of genomic variants on splicing leading to Mendelian and complex hereditary disorders. The method could potentially be used to screen resequenced patient DNA to identify de novo mutations and polymorphic variants that could contribute to a genetic disorder.


Assuntos
Perfilação da Expressão Gênica/métodos , Variação Genética , Genoma Humano , Splicing de RNA/genética , Neoplasias da Mama , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Isoformas de Proteínas/genética
9.
J Mol Biol ; 396(5): 1410-21, 2010 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-20004669

RESUMO

In Darwinian evolution, mutations occur approximately at random in a gene, turned into amino acid mutations by the genetic code. Some mutations are fixed to become substitutions and some are eliminated from the population. Partitioning pairs of closely related species with complete genome sequences by average population size of each pair, we looked at the substitution matrices generated for these partitions and compared the substitution patterns between species. We estimated a population genetic model that relates the relative fixation probabilities of different types of mutations to the selective pressure and population size. Parameterizations of the average and distribution of selective pressures for different amino acid substitution types in different population size comparisons were generated with a Bayesian framework. We found that partitions in population size as well as in substitution type are required to explain the substitution data. Selection coefficients were found to decrease with increasingly radical amino acid substitution and with increasing effective population size. To further explore the role of underlying processes in amino acid substitution, we analyzed embryophyte (plant) gene families from TAED (The Adaptive Evolution Database), where solved structures for at least one member exist in the Protein Data Bank. Using PAML, we assigned branches to three categories: strong negative selection, moderate negative selection/neutrality, and positive diversifying selection. Focusing on the first and third categories, we identified sites changing along gene family lineages and observed the spatial patterns of substitution. Selective sweeps were expected to create primary sequence clustering under positive diversifying selection. Co-evolution through direct physical interaction was expected to cause tertiary structural clustering. Under both positive and negative selection, the substitution patterns were found to be nonrandom. Under positive diversifying selection, significant independent signals were found for primary and tertiary sequence clustering, suggesting roles for both selective sweeps and direct physical interaction. Under strong negative selection, the signals were not found to be independent. All together, a complex interplay of population genetic and protein thermodynamics forces is suggested.


Assuntos
Substituição de Aminoácidos , Evolução Molecular , Proteínas/genética , Animais , Bases de Dados Genéticas , Bases de Dados de Proteínas , Genética Populacional , Genoma de Planta , Humanos , Modelos Genéticos , Modelos Moleculares , Família Multigênica , Proteínas de Plantas/química , Proteínas de Plantas/genética , Densidade Demográfica , Estrutura Terciária de Proteína , Proteínas/química , Seleção Genética
10.
BMC Genomics ; 10: 508, 2009 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-19889216

RESUMO

BACKGROUND: Auxiliary splicing sequences play an important role in ensuring accurate and efficient splicing by promoting or repressing recognition of authentic splice sites. These cis-acting motifs have been termed splicing enhancers and silencers and are located both in introns and exons. They co-evolved into an intricate splicing code together with additional functional constraints, such as tissue-specific and alternative splicing patterns. We used orthologous exons extracted from the University of California Santa Cruz multiple genome alignments of human and 22 Tetrapoda organisms to predict candidate enhancers and silencers that have reproducible and statistically significant bias towards annotated exonic boundaries. RESULTS: A total of 2,546 Tetrapoda enhancers and silencers were clustered into 15 putative core motifs based on their Markov properties. Most of these elements have been identified previously, but 118 putative silencers and 260 enhancers (~15%) were novel. Examination of previously published experimental data for the presence of predicted elements showed that their mutations in 21/23 (91.3%) cases altered the splicing pattern as expected. Predicted intronic motifs flanking 3' and 5' splice sites had higher evolutionary conservation than other sequences within intronic flanks and the intronic enhancers were markedly differed between 3' and 5' intronic flanks. CONCLUSION: Difference in intronic enhancers supporting 5' and 3' splice sites suggests an independent splicing commitment for neighboring exons. Increased evolutionary conservation for ISEs/ISSs within intronic flanks and effect of modulation of predicted elements on splicing suggest functional significance of found elements in splicing regulation. Most of the elements identified were shown to have direct implications in human splicing and therefore could be useful for building computational splicing models in biomedical research.


Assuntos
Simulação por Computador , Modelos Genéticos , Splicing de RNA/genética , Vertebrados/genética , Animais , Sequência de Bases , Sítios de Ligação , Sequência Conservada , Éxons/genética , Humanos , Íntrons/genética , RNA Mensageiro/genética , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/metabolismo
11.
BMC Bioinformatics ; 9 Suppl 9: S13, 2008 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-18793458

RESUMO

BACKGROUND: Ionic current blockade signal processing, for use in nanopore detection, offers a promising new way to analyze single molecule properties with potential implications for DNA sequencing. The alpha-Hemolysin transmembrane channel interacts with a translocating molecule in a nontrivial way, frequently evidenced by a complex ionic flow blockade pattern with readily distinguishable modes of toggling. Effective processing of such signals requires developing machine learning methods capable of learning the various blockade modes for classification and knowledge discovery purposes. Here we propose a method aimed to improve our stochastic analysis capabilities to better understand the discriminatory capabilities of the observed the nanopore channel interactions with analyte. RESULTS: We tailored our memory-sparse distributed implementation of a Mixture of Hidden Markov Models (MHMMs) to the problem of channel current blockade clustering and associated analyte classification. By using probabilistic fully connected HMM profiles as mixture components we were able to cluster the various 9 base-pair hairpin channel blockades. We obtained very high Maximum a Posteriori (MAP) classification with a mixture of 12 different channel blockade profiles, each with 4 levels, a configuration that can be computed with sufficient speed for real-time experimental feedback. MAP classification performance depends on several factors such as the number of mixture components, the number of levels in each profile, and the duration of a channel blockade event. We distribute Baum-Welch Expectation Maximization (EM) algorithms running on our model in two ways. A distributed implementation of the MHMM data processing accelerates data clustering efforts. The second, simultanteous, strategy uses an EM checkpointing algorithm to lower the memory use and efficiently distribute the bulk of EM processing in processing large data sequences (such as for the progressive sums used in the HMM parameter estimates). CONCLUSION: The proposed distributed MHMM method has many appealing properties, such as precise classification of analyte in real-time scenarios, and the ability to incorporate new domain knowledge into a flexible, easily distributable, architecture. The distributed HMM provides a feature extraction that is equivalent to that of the sequential HMM with a speedup factor approximately equal to the number of independent CPUs operating on the data. The MHMM topology learns clusters existing within data samples via distributed HMM EM learning. A Java implementation of the MHMM algorithm is available at http://logos.cs.uno.edu/~achurban.


Assuntos
Algoritmos , Bioensaio/métodos , DNA/química , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Proteínas Hemolisinas/química , Proteínas Hemolisinas/genética , Ativação do Canal Iônico/genética , Canais Iônicos/química , Canais Iônicos/genética , Análise de Sequência de DNA/métodos , Inteligência Artificial , Sequência de Bases , Canais Iônicos/análise , Cadeias de Markov , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos
12.
Biol Direct ; 3: 30, 2008 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-18613975

RESUMO

UNLABELLED: The GT dinucleotide in the first two intron positions is the most conserved element of the U2 donor splice signals. However, in a small fraction of donor sites, GT is replaced by GC. A substantial enrichment of GC in donor sites of alternatively spliced genes has been observed previously in human, nematode and Arabidopsis, suggesting that GC signals are important for regulation of alternative splicing. We used parsimony analysis to reconstruct evolution of donor splice sites and inferred 298 GT > GC conversion events compared to 40 GC > GT conversion events in primate and rodent genomes. Thus, there was substantive accumulation of GC donor splice sites during the evolution of mammals. Accumulation of GC sites might have been driven by selection for alternative splicing. REVIEWERS: This article was reviewed by Jerzy Jurka and Anton Nekrutenko. For the full reviews, please go to the Reviewers' Reports section.


Assuntos
Repetições de Dinucleotídeos/genética , Evolução Molecular , Mamíferos/genética , Sítios de Splice de RNA/genética , Processamento Alternativo/genética , Animais , Bovinos , Sequência Conservada , Cães , Humanos , Íntrons/genética , Macaca mulatta , Camundongos , Ratos
13.
Gene ; 418(1-2): 22-6, 2008 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-18486364

RESUMO

Total evidence and the use of large datasets to overcome uncertainty are the state of the art in systematic analysis. This assumes that the only true phylogenetic signal is ancestry and that functional, structural, and other factors will not add an alternative signal. Using gene families, where individual codon positions were sorted into bins based upon average-pairwise dN/dS ratio, we show that standard, common phylogenetic methods that were designed for stochastic, neutral, site-independent processes, generate less robust phylogenetic signal for bins with strong negative or positive selection. This was true for phylogenetic reconstruction with parsimony, distance, and likelihood methods. Further, we present a case for the potential existence of systematic functional or structural signal that competes with ancestral signal. For the example of positive selection, we simulate the evolution of sequences through three dimensional lattice constructs with folding constraint and changing binding functionality and show that total evidence for these lattice genes presents trees with functional signal, but that the neutral synonymous sites in these genes show the true ancestral signal. In this case, sequence convergence is promoted by functional convergence.


Assuntos
Evolução Biológica , Modelos Genéticos , Filogenia , Seleção Genética , Animais , Cordados/genética , Bases de Dados de Ácidos Nucleicos
14.
BMC Bioinformatics ; 9: 224, 2008 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-18447951

RESUMO

BACKGROUND: The Baum-Welch learning procedure for Hidden Markov Models (HMMs) provides a powerful tool for tailoring HMM topologies to data for use in knowledge discovery and clustering. A linear memory procedure recently proposed by Miklós, I. and Meyer, I.M. describes a memory sparse version of the Baum-Welch algorithm with modifications to the original probabilistic table topologies to make memory use independent of sequence length (and linearly dependent on state number). The original description of the technique has some errors that we amend. We then compare the corrected implementation on a variety of data sets with conventional and checkpointing implementations. RESULTS: We provide a correct recurrence relation for the emission parameter estimate and extend it to parameter estimates of the Normal distribution. To accelerate estimation of the prior state probabilities, and decrease memory use, we reverse the originally proposed forward sweep. We describe different scaling strategies necessary in all real implementations of the algorithm to prevent underflow. In this paper we also describe our approach to a linear memory implementation of the Viterbi decoding algorithm (with linearity in the sequence length, while memory use is approximately independent of state number). We demonstrate the use of the linear memory implementation on an extended Duration Hidden Markov Model (DHMM) and on an HMM with a spike detection topology. Comparing the various implementations of the Baum-Welch procedure we find that the checkpointing algorithm produces the best overall tradeoff between memory use and speed. In cases where sequence length is very large (for Baum-Welch), or state number is very large (for Viterbi), the linear memory methods outlined may offer some utility. CONCLUSION: Our performance-optimized Java implementations of Baum-Welch algorithm are available at http://logos.cs.uno.edu/~achurban. The described method and implementations will aid sequence alignment, gene structure prediction, HMM profile training, nanopore ionic flow blockades analysis and many other domains that require efficient HMM training with EM.


Assuntos
DNA/ultraestrutura , Armazenamento e Recuperação da Informação/métodos , Cadeias de Markov , Redes Neurais de Computação , Design de Software , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Computadores/estatística & dados numéricos , DNA/análise , DNA/química , Interpretação Estatística de Dados , Impedância Elétrica , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Ativação do Canal Iônico , Funções Verossimilhança , Modelos Lineares , Modelos Moleculares , Distribuição Normal , Conformação de Ácido Nucleico , Reconhecimento Automatizado de Padrão/métodos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , Alinhamento de Sequência , Análise de Sequência de DNA , Pesos e Medidas
15.
BMC Bioinformatics ; 8 Suppl 7: S14, 2007 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-18047713

RESUMO

BACKGROUND: Ionic current blockade signal processing, for use in nanopore detection, offers a promising new way to analyze single molecule properties, with potential implications for DNA sequencing. The alpha-Hemolysin transmembrane channel interacts with a translocating molecule in a nontrivial way, frequently evidenced by a complex ionic flow blockade pattern. Typically, recorded current blockade signals have several levels of blockade, with various durations, all obeying a fixed statistical profile for a given molecule. Hidden Markov Model (HMM) based duration learning experiments on artificial two-level Gaussian blockade signals helped us to identify proper modeling framework. We then apply our framework to the real multi-level DNA hairpin blockade signal. RESULTS: The identified upper level blockade state is observed with durations that are geometrically distributed (consistent with an a physical decay process for remaining in any given state). We show that mixture of convolution chains of geometrically distributed states is better for presenting multimodal long-tailed duration phenomena. Based on learned HMM profiles we are able to classify 9 base-pair DNA hairpins with accuracy up to 99.5% on signals from same-day experiments. CONCLUSION: We have demonstrated several implementations for de novo estimation of duration distribution probability density function with HMM framework and applied our model topology to the real data. The proposed design could be handy in molecular analysis based on nanopore current blockade signal.


Assuntos
Algoritmos , Aptâmeros de Nucleotídeos/análise , Inteligência Artificial , Toxinas Bacterianas/química , Técnicas Biossensoriais/instrumentação , Proteínas Hemolisinas/química , Nanotecnologia/instrumentação , Aptâmeros de Nucleotídeos/imunologia , Desenho de Equipamento , Análise de Falha de Equipamento , Ativação do Canal Iônico , Cinética , Potenciais da Membrana , Nanotecnologia/métodos , Reconhecimento Automatizado de Padrão/métodos , Porosidade , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Transdutores
16.
Biol Direct ; 1: 10, 2006 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-16584568

RESUMO

BACKGROUND: Predicting and proper ranking of canonical splice sites (SSs) is a challenging problem in bioinformatics and machine learning communities. Any progress in SSs recognition will lead to better understanding of splicing mechanism. We introduce several new approaches of combining a priori knowledge for improved SS detection. First, we design our new Bayesian SS sensor based on oligonucleotide counting. To further enhance prediction quality, we applied our new de novo motif detection tool MHMMotif to intronic ends and exons. We combine elements found with sensor information using Naive Bayesian Network, as implemented in our new tool SpliceScan. RESULTS: According to our tests, the Bayesian sensor outperforms the contemporary Maximum Entropy sensor for 5' SS detection. We report a number of putative Exonic (ESE) and Intronic (ISE) Splicing Enhancers found by MHMMotif tool. T-test statistics on mouse/rat intronic alignments indicates, that detected elements are on average more conserved as compared to other oligos, which supports our assumption of their functional importance. The tool has been shown to outperform the SpliceView, GeneSplicer, NNSplice, Genio and NetUTR tools for the test set of human genes. SpliceScan outperforms all contemporary ab initio gene structural prediction tools on the set of 5' UTR gene fragments. CONCLUSION: Designed methods have many attractive properties, compared to existing approaches. Bayesian sensor, MHMMotif program and SpliceScan tools are freely available on our web site. REVIEWERS: This article was reviewed by Manyuan Long, Arcady Mushegian and Mikhail Gelfand.

17.
BMC Bioinformatics ; 6: 261, 2005 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-16242044

RESUMO

BACKGROUND: Accurate and automatic gene finding and structural prediction is a common problem in bioinformatics, and applications need to be capable of handling non-canonical splice sites, micro-exons and partial gene structure predictions that span across several genomic clones. RESULTS: We present a mRNA/DNA homology based gene structure prediction tool, GIGOgene. We use a new affine gap penalty splice-enhanced global alignment algorithm running in linear memory for a high quality annotation of splice sites. Our tool includes a novel algorithm to assemble partial gene structure predictions using interval graphs. GIGOgene exhibited a sensitivity of 99.08% and a specificity of 99.98% on the Genie learning set, and demonstrated a higher quality of gene structural prediction when compared to Sim4, est2genome, Spidey, Galahad and BLAT, including when genes contained micro-exons and non-canonical splice sites. GIGOgene showed an acceptable loss of prediction quality when confronted with a noisy Genie learning set simulating ESTs. CONCLUSION: GIGOgene shows a higher quality of gene structure prediction for mRNA/DNA spliced alignment when compared to other available tools.


Assuntos
DNA/química , RNA Mensageiro/química , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico , Software , Algoritmos , Células Clonais/química , Valor Preditivo dos Testes
18.
Nucleic Acids Res ; 33(17): 5512-20, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16186132

RESUMO

By comparing sequences of human, mouse and rat orthologous genes, we show that in 5'-untranslated regions (5'-UTRs) of mammalian cDNAs but not in 3'-UTRs or coding sequences, AUG is conserved to a significantly greater extent than any of the other 63 nt triplets. This effect is likely to reflect, primarily, bona fide evolutionary conservation, rather than cDNA annotation artifacts, because the excess of conserved upstream AUGs (uAUGs) is seen in 5'-UTRs containing stop codons in-frame with the start AUG and many of the conserved AUGs are found in different frames, consistent with the location in authentic non-coding sequences. Altogether, conserved uAUGs are present in at least 20-30% of mammalian genes. Qualitatively similar results were obtained by comparison of orthologous genes from different species of the yeast genus Saccharomyces. Together with the observation that mammalian and yeast 5'-UTRs are significantly depleted in overall AUG content, these findings suggest that AUG triplets in 5'-UTRs are subject to the pressure of purifying selection in two opposite directions: the uAUGs that have no specific function tend to be deleterious and get eliminated during evolution, whereas those uAUGs that do serve a function are conserved. Most probably, the principal role of the conserved uAUGs is attenuation of translation at the initiation stage, which is often additionally regulated by alternative splicing in the mammalian 5'-UTRs. Consistent with this hypothesis, we found that open reading frames starting from conserved uAUGs are significantly shorter than those starting from non-conserved uAUGs, possibly, owing to selection for optimization of the level of attenuation.


Assuntos
Regiões 5' não Traduzidas/química , Códon de Iniciação , Evolução Molecular , Animais , Sequência de Bases , Códon de Terminação , Sequência Conservada , Humanos , Camundongos , Dados de Sequência Molecular , Ratos , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA