Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Cell ; 177(1): 32-37, 2019 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-30901545

RESUMO

The introduction of exome sequencing in the clinic has sparked tremendous optimism for the future of rare disease diagnosis, and there is exciting opportunity to further leverage these advances. To provide diagnostic clarity to all of these patients, however, there is a critical need for the field to develop and implement strategies to understand the mechanisms underlying all rare diseases and translate these to clinical care.


Assuntos
Sequenciamento do Exoma/tendências , Doenças Raras/diagnóstico , Pesquisa Translacional Biomédica/métodos , Exoma , Testes Genéticos , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/tendências , Humanos , Doenças Raras/genética , Análise de Sequência de DNA/métodos , Sequenciamento do Exoma/métodos
2.
Genome Res ; 27(12): 2015-2024, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29097404

RESUMO

Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5' untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5' UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5' UTRs as well as native S. cerevisiae 5' UTRs. The model additionally was used to computationally evolve highly active 5' UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.


Assuntos
Modelos Genéticos , Saccharomyces cerevisiae/genética , Regiões 5' não Traduzidas , Processamento Alternativo , Simulação por Computador , Biblioteca Gênica , Aprendizado de Máquina , Redes Neurais de Computação , RNA Fúngico , RNA Mensageiro
3.
J Infect Dis ; 213(8): 1248-52, 2016 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-26655301

RESUMO

Outcomes of chronic infection with hepatitis B virus (HBV) are varied, with increased morbidity reported in the context of human immunodeficiency virus (HIV) coinfection. The factors driving different outcomes are not well understood, but there is increasing interest in an HLA class I effect. We therefore studied the influence of HLA class I on HBV in an African HIV-positive cohort. We demonstrated that virologic markers of HBV disease activity (hepatitis B e antigen status or HBV DNA level) are associated with HLA-A genotype. This finding supports the role of the CD8(+) T-cell response in HBV control, and potentially informs future therapeutic T-cell vaccine strategies.


Assuntos
Coinfecção , Infecções por HIV , Antígenos HLA/genética , Antígenos E da Hepatite B/sangue , Hepatite B , Adulto , Estudos de Coortes , Coinfecção/complicações , Coinfecção/epidemiologia , Coinfecção/genética , Coinfecção/virologia , Feminino , Infecções por HIV/complicações , Infecções por HIV/epidemiologia , Infecções por HIV/virologia , Hepatite B/complicações , Hepatite B/epidemiologia , Hepatite B/genética , Hepatite B/virologia , Humanos , Masculino , Prevalência , Curva ROC
4.
Proc Natl Acad Sci U S A ; 110(33): 13492-7, 2013 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-23878211

RESUMO

Experimental and computational evidence suggests that HLAs preferentially bind conserved regions of viral proteins, a concept we term "targeting efficiency," and that this preference may provide improved clearance of infection in several viral systems. To test this hypothesis, T-cell responses to A/H1N1 (2009) were measured from peripheral blood mononuclear cells obtained from a household cohort study performed during the 2009-2010 influenza season. We found that HLA targeting efficiency scores significantly correlated with IFN-γ enzyme-linked immunosorbent spot responses (P = 0.042, multiple regression). A further population-based analysis found that the carriage frequencies of the alleles with the lowest targeting efficiencies, A*24, were associated with pH1N1 mortality (r = 0.37, P = 0.031) and are common in certain indigenous populations in which increased pH1N1 morbidity has been reported. HLA efficiency scores and HLA use are associated with CD8 T-cell magnitude in humans after influenza infection. The computational tools used in this study may be useful predictors of potential morbidity and identify immunologic differences of new variant influenza strains more accurately than evolutionary sequence comparisons. Population-based studies of the relative frequency of these alleles in severe vs. mild influenza cases might advance clinical practices for severe H1N1 infections among genetically susceptible populations.


Assuntos
Linfócitos T CD4-Positivos/imunologia , Antígenos HLA/imunologia , Vírus da Influenza A Subtipo H1N1 , Influenza Humana/epidemiologia , Influenza Humana/imunologia , Estudos de Coortes , Biologia Computacional/métodos , ELISPOT , Frequência do Gene , Antígenos HLA/metabolismo , Humanos , Interferon gama/imunologia , Modelos Estatísticos , Análise de Regressão
5.
PLoS One ; 18(9): e0291169, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37729186

RESUMO

Campaign contributions are a staple of congressional life. Yet, the search for tangible effects of congressional donations often focuses on the association between contributions and votes on congressional bills. We present an alternative approach by considering the relationship between money and legislators' speech. Floor speeches are an important component of congressional behavior, and reflect a legislator's policy priorities and positions in a way that voting cannot. Our research provides the first comprehensive analysis of the association between a legislator's campaign donors and the policy issues they prioritize with congressional speech. Ultimately, we find a robust relationship between donors and speech, indicating a more pervasive role of money in politics than previously assumed. We use a machine learning framework on a new dataset that brings together legislator metadata for all representatives in the US House between 1995 and 2018, including committee assignments, legislative speech, donation records, and information about Political Action Committees. We compare information about donations against other potential explanatory variables, such as party affiliation, home state, and committee assignments, and find that donors consistently have the strongest association with legislators' issue-attention. We further contribute a procedure for identifying speech and donation events that occur in close proximity to one another and share meaningful connections, identifying the proverbial needles in the haystack of speech and donation activity in Congress which may be cases of interest for investigative journalism. Taken together, our framework, data, and findings can help increase the transparency of the role of money in politics.


Assuntos
Aprendizado de Máquina , Doadores de Tecidos , Humanos , Metadados , Políticas , Política
6.
BMC Bioinformatics ; 13 Suppl 6: S11, 2012 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-22537040

RESUMO

Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches. We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches.


Assuntos
Processamento Alternativo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Éxons , Perfilação da Expressão Gênica , Células HeLa , Humanos , Modelos Estatísticos , Reação em Cadeia da Polimerase Via Transcriptase Reversa
7.
J Virol ; 85(3): 1310-21, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21084470

RESUMO

The high diversity of HLA binding preferences has been driven by the sequence diversity of short segments of relevant pathogenic proteins presented by HLA molecules to the immune system. To identify possible commonalities in HLA binding preferences, we quantify these using a novel measure termed "targeting efficiency," which captures the correlation between HLA-peptide binding affinities and the conservation of the targeted proteomic regions. Analysis of targeting efficiencies for 95 HLA class I alleles over thousands of human proteins and 52 human viruses indicates that HLA molecules preferentially target conserved regions in these proteomes, although the arboviral Flaviviridae are a notable exception where nonconserved regions are preferentially targeted by most alleles. HLA-A alleles and several HLA-B alleles that have maintained close sequence identity with chimpanzee homologues target conserved human proteins and DNA viruses such as Herpesviridae and Adenoviridae most efficiently, while all HLA-B alleles studied efficiently target RNA viruses. These patterns of host and pathogen specialization are both consistent with coevolutionary selection and functionally relevant in specific cases; for example, preferential HLA targeting of conserved proteomic regions is associated with improved outcomes in HIV infection and with protection against dengue hemorrhagic fever. Efficiency analysis provides a novel perspective on the coevolutionary relationship between HLA class I molecular diversity, self-derived peptides that shape T-cell immunity through ontogeny, and the broad range of viruses that subsequently engage with the adaptive immune response.


Assuntos
Evolução Molecular , Antígenos de Histocompatibilidade Classe I/genética , Antígenos de Histocompatibilidade Classe I/imunologia , Interações Hospedeiro-Patógeno , Proteínas/genética , Proteínas/imunologia , Vírus/imunologia , Sequência Conservada , Humanos , Ligação Proteica
8.
Cogsci ; 2021: 1767-1773, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34617074

RESUMO

A longstanding question in cognitive science concerns the learning mechanisms underlying compositionality in human cognition. Humans can infer the structured relationships (e.g., grammatical rules) implicit in their sensory observations (e.g., auditory speech), and use this knowledge to guide the composition of simpler meanings into complex wholes. Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings (e.g., the quantities corresponding to numerals) should be composed according to structured rules (e.g., order of operations). Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.

9.
J Immunol ; 181(9): 6361-70, 2008 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-18941227

RESUMO

Hepatitis C virus (HCV) vaccine efficacy may crucially depend on immunogen length and coverage of viral sequence diversity. However, covering a considerable proportion of the circulating viral sequence variants would likely require long immunogens, which for the conserved portions of the viral genome, would contain unnecessarily redundant sequence information. In this study, we present the design and in vitro performance analysis of a novel "epitome" approach that compresses frequent immune targets of the cellular immune response against HCV into a shorter immunogen sequence. Compression of immunological information is achieved by partial overlapping shared sequence motifs between individual epitopes. At the same time, sequence diversity coverage is provided by taking advantage of emerging cross-reactivity patterns among epitope variants so that epitope variants associated with the broadest variant cross-recognition are preferentially included. The processing and presentation analysis of specific epitopes included in such a compressed, in vitro-expressed HCV epitome indicated effective processing of a majority of tested epitopes, although re-presentation of some epitopes may require refined sequence design. Together, the present study establishes the epitome approach as a potential powerful tool for vaccine immunogen design, especially suitable for the induction of cellular immune responses against highly variable pathogens.


Assuntos
Apresentação de Antígeno/imunologia , Epitopos de Linfócito T/biossíntese , Epitopos de Linfócito T/química , Regulação da Expressão Gênica/imunologia , Hepacivirus/imunologia , Linfócitos T Citotóxicos/imunologia , Linfócitos T Citotóxicos/metabolismo , Sequência de Aminoácidos , Linhagem Celular , Epitopos de Linfócito T/imunologia , Epitopos de Linfócito T/metabolismo , Antígeno HLA-B35/biossíntese , Antígeno HLA-B35/química , Antígeno HLA-B35/imunologia , Antígeno HLA-B35/metabolismo , Hepacivirus/genética , Hepatite C Crônica/imunologia , Hepatite C Crônica/metabolismo , Hepatite C Crônica/virologia , Humanos , Epitopos Imunodominantes/biossíntese , Epitopos Imunodominantes/química , Epitopos Imunodominantes/imunologia , Epitopos Imunodominantes/metabolismo , Dados de Sequência Molecular , Proteoma/biossíntese , Proteoma/síntese química , Proteoma/imunologia , Proteoma/metabolismo , Linfócitos T Citotóxicos/virologia , Proteínas não Estruturais Virais/biossíntese , Proteínas não Estruturais Virais/síntese química , Proteínas não Estruturais Virais/imunologia , Proteínas não Estruturais Virais/metabolismo
10.
IEEE Trans Pattern Anal Mach Intell ; 41(12): 3086-3099, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30130178

RESUMO

A new unified video analytics framework (ER3) is proposed for complex event retrieval, recognition and recounting, based on the proposed video imprint representation, which exploits temporal correlations among image features across video frames. With the video imprint representation, it is convenient to reverse map back to both temporal and spatial locations in video frames, allowing for both key frame identification and key areas localization within each frame. In the proposed framework, a dedicated feature alignment module is incorporated for redundancy removal across frames to produce the tensor representation, i.e., the video imprint. Subsequently, the video imprint is individually fed into both a reasoning network and a feature aggregation module, for event recognition/recounting and event retrieval tasks, respectively. Thanks to its attention mechanism inspired by the memory networks used in language modeling, the proposed reasoning network is capable of simultaneous event category recognition and localization of the key pieces of evidence for event recounting. In addition, the latent structure in our reasoning network highlights the areas of the video imprint, which can be directly used for event recounting. With the event retrieval task, the compact video representation aggregated from the video imprint contributes to better retrieval results than existing state-of-the-art methods.

11.
PLoS Comput Biol ; 3(4): e75, 2007 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-17465674

RESUMO

The ability of human immunodeficiency virus type 1 (HIV-1) to develop high levels of genetic diversity, and thereby acquire mutations to escape immune pressures, contributes to the difficulties in producing a vaccine. Possibly no single HIV-1 sequence can induce sufficiently broad immunity to protect against a wide variety of infectious strains, or block mutational escape pathways available to the virus after infection. The authors describe the generation of HIV-1 immunogens that minimizes the phylogenetic distance of viral strains throughout the known viral population (the center of tree [COT]) and then extend the COT immunogen by addition of a composite sequence that includes high-frequency variable sites preserved in their native contexts. The resulting COT(+) antigens compress the variation found in many independent HIV-1 isolates into lengths suitable for vaccine immunogens. It is possible to capture 62% of the variation found in the Nef protein and 82% of the variation in the Gag protein into immunogens of three gene lengths. The authors put forward immunogen designs that maximize representation of the diverse antigenic features present in a spectrum of HIV-1 strains. These immunogens should elicit immune responses against high-frequency viral strains as well as against most mutant forms of the virus.


Assuntos
Vacinas contra a AIDS/genética , Vacinas contra a AIDS/imunologia , Variação Antigênica/genética , Mapeamento de Epitopos/métodos , Produtos do Gene nef/genética , Produtos do Gene nef/imunologia , Variação Genética/genética , Desenho de Fármacos , Produtos do Gene nef do Vírus da Imunodeficiência Humana
12.
Mol Biochem Parasitol ; 155(2): 103-12, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17669514

RESUMO

VAR2CSA is the main candidate for a pregnancy malaria vaccine, but vaccine development may be complicated by sequence polymorphism. Here, we obtained partial or full-length var2CSA sequences from 106 parasites and applied novel computational methods and three-dimensional modeling to investigate VAR2CSA geographic variation and selection pressure. Our analysis reveals structural patterns of VAR2CSA sequence variation in which polymorphic sites group into segments of limited diversity. Within these segments, two or three basic types characterize a substantial majority of the parasite samples. Comparison to the primate malaria Plasmodium reichenowi shows that these basic types have ancient origins. Globally, var2CSA genes are comprised of a mosaic of these ancestral polymorphic segments that have recombined extensively between var2CSA alleles. Three-dimensional modeling reveals that polymorphic segments concentrate in flexible loops at characteristic locations in the six VAR2CSA Duffy binding-like (DBL) adhesion domains. Individual DBL domain surfaces have distinct patterns of diversifying selection, suggesting that limited and differing portions of each DBL domain are targeted by host antibody. Since standard phylogenetic tree analysis is inadequate for highly recombining genes like var2CSA, we developed a novel phylogenetic approach that incorporates recombination and tracks new mutations in segment types. In the resulting tree, P. reichenowi is confirmed as an outlier and African and Asian P. falciparum isolates have slightly diverged. These findings validate a new approach to modeling protein evolution in the presence of frequent recombination and provide a clearer understanding of how var gene products function as immunoevasive binding ligands.


Assuntos
Antígenos de Protozoários/genética , Antígenos de Protozoários/imunologia , Malária/parasitologia , Plasmodium falciparum/genética , Polimorfismo Genético , Complicações Parasitárias na Gravidez/imunologia , Seleção Genética , Animais , Antígenos de Protozoários/química , Biologia Computacional/métodos , DNA de Protozoário/química , DNA de Protozoário/genética , Feminino , Geografia , Humanos , Malária/imunologia , Vacinas Antimaláricas/imunologia , Modelos Moleculares , Dados de Sequência Molecular , Filogenia , Plasmodium falciparum/isolamento & purificação , Gravidez , Complicações Parasitárias na Gravidez/prevenção & controle , Estrutura Terciária de Proteína , Análise de Sequência de DNA , Homologia de Sequência de Aminoácidos
13.
Bioinformatics ; 22(14): e227-35, 2006 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16873476

RESUMO

MOTIVATION AND RESULTS: Motivated by the ability of a simple threading approach to predict MHC I--peptide binding, we developed a new and improved structure-based model for which parameters can be estimated from additional sources of data about MHC-peptide binding. In addition to the known 3D structures of a small number of MHC-peptide complexes that were used in the original threading approach, we included three other sources of information on peptide-MHC binding: (1) MHC class I sequences; (2) known binding energies for a large number of MHC-peptide complexes; and (3) an even larger binary dataset that contains information about strong binders (epitopes) and non-binders (peptides that have a low affinity for a particular MHC molecule). Our model significantly outperforms the standard threading approach in binding energy prediction. In our approach, which we call adaptive double threading, the parameters of the threading model are learnable, and both MHC and peptide sequences can be threaded onto structures of other alleles. These two properties make our model appropriate for predicting binding for alleles for which very little data (if any) is available beyond just their sequence, including prediction for alleles for which 3D structures are not available. The ability of our model to generalize beyond the MHC types for which training data is available also separates our approach from epitope prediction methods which treat MHC alleles as symbolic types, rather than biological sequences. We used the trained binding energy predictor to study viral infections in 246 HIV patients from the West Australian cohort, and over 1000 sequences in HIV clade B from Los Alamos National Laboratory database, capturing the course of HIV evolution over the last 20 years. Finally, we illustrate short-, medium-, and long-term adaptation of HIV to the human immune system. AVAILABILITY: http://www.research.microsoft.com/~jojic/hlaBinding.html.


Assuntos
Algoritmos , Antígenos de Histocompatibilidade Classe I/química , Modelos Químicos , Modelos Moleculares , Peptídeos/química , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Inteligência Artificial , Sítios de Ligação , Simulação por Computador , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Ligação Proteica , Conformação Proteica , Alinhamento de Sequência/métodos
14.
Artif Intell Med ; 70: 1-11, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27431033

RESUMO

OBJECTIVE: High-throughput technologies have generated an unprecedented amount of high-dimensional gene expression data. Algorithmic approaches could be extremely useful to distill information and derive compact interpretable representations of the statistical patterns present in the data. This paper proposes a mining approach to extract an informative representation of gene expression profiles based on a generative model called the Counting Grid (CG). METHOD: Using the CG model, gene expression values are arranged on a discrete grid, learned in a way that "similar" co-expression patterns are arranged in close proximity, thus resulting in an intuitive visualization of the dataset. More than this, the model permits to identify the genes that distinguish between classes (e.g. different types of cancer). Finally, each sample can be characterized with a discriminative signature - extracted from the model - that can be effectively employed for classification. RESULTS: A thorough evaluation on several gene expression datasets demonstrate the suitability of the proposed approach from a twofold perspective: numerically, we reached state-of-the-art classification accuracies on 5 datasets out of 7, and similar results when the approach is tested in a gene selection setting (with a stability always above 0.87); clinically, by confirming that many of the genes highlighted by the model as significant play also a key role for cancer biology. CONCLUSION: The proposed framework can be successfully exploited to meaningfully visualize the samples; detect medically relevant genes; properly classify samples.


Assuntos
Algoritmos , Mineração de Dados , Perfilação da Expressão Gênica , Análise por Conglomerados , Genes Neoplásicos , Humanos , Neoplasias/genética
15.
AIDS ; 30(5): 701-11, 2016 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-26730570

RESUMO

OBJECTIVES: AIDS is caused by CD4 T-cell depletion. Although combination antiretroviral therapy can restore blood T-cell numbers, the clonal diversity of the reconstituting cells, critical for immunocompetence, is not well defined. METHODS: We performed an extensive analysis of parameters of thymic function in perinatally HIV-1-infected (n = 39) and control (n = 28) participants ranging from 13 to 23 years of age. CD4 T cells including naive (CD27 CD45RA) and recent thymic emigrant (RTE) (CD31/CD45RA) cells, were quantified by flow cytometry. Deep sequencing was used to examine T-cell receptor (TCR) sequence diversity in sorted RTE CD4 T cells. RESULTS: Infected participants had reduced CD4 T-cell levels with predominant depletion of the memory subset and preservation of naive cells. RTE CD4 T-cell levels were normal in most infected individuals, and enhanced thymopoiesis was indicated by higher proportions of CD4 T cells containing TCR recombination excision circles. Memory CD4 T-cell depletion was highly associated with CD8 T-cell activation in HIV-1-infected persons and plasma interlekin-7 levels were correlated with naive CD4 T cells, suggesting activation-driven loss and compensatory enhancement of thymopoiesis. Deep sequencing of CD4 T-cell receptor sequences in well compensated infected persons demonstrated supranormal diversity, providing additional evidence of enhanced thymic output. CONCLUSION: Despite up to two decades of infection, many individuals have remarkable thymic reserve to compensate for ongoing CD4 T-cell loss, although there is ongoing viral replication and immune activation despite combination antiretroviral therapy. The longer term sustainability of this physiology remains to be determined.


Assuntos
Linfócitos T CD4-Positivos/imunologia , Infecções por HIV/imunologia , HIV-1/crescimento & desenvolvimento , Subpopulações de Linfócitos T/imunologia , Timo/fisiologia , Adolescente , Linfócitos T CD4-Positivos/química , Linfócitos T CD4-Positivos/classificação , Feminino , Citometria de Fluxo , Variação Genética , Infecções por HIV/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Antígenos Comuns de Leucócito/análise , Masculino , Molécula-1 de Adesão Celular Endotelial a Plaquetas/análise , Receptores de Antígenos de Linfócitos T/genética , Análise de Sequência de DNA , Subpopulações de Linfócitos T/química , Subpopulações de Linfócitos T/classificação , Membro 7 da Superfamília de Receptores de Fatores de Necrose Tumoral/análise , Adulto Jovem
16.
Bioinformatics ; 20 Suppl 1: i161-8, 2004 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-15262795

RESUMO

MOTIVATION: We consider models useful for learning an evolutionary or phylogenetic tree from data consisting of DNA sequences corresponding to the leaves of the tree. In particular, we consider a general probabilistic model described in Siepel and Haussler that we call the phylogenetic-HMM model which generalizes the classical probabilistic models of Neyman and Felsenstein. Unfortunately, computing the likelihood of phylogenetic-HMM models is intractable. We consider several approximations for computing the likelihood of such models including an approximation introduced in Siepel and Haussler, loopy belief propagation and several variational methods. RESULTS: We demonstrate that, unlike the other approximations, variational methods are accurate and are guaranteed to lower bound the likelihood. In addition, we identify a particular variational approximation to be best-one in which the posterior distribution is variationally approximated using the classic Neyman-Felsenstein model. The application of our best approximation to data from the cystic fibrosis transmembrane conductance regulator gene region across nine eutherian mammals reveals a CpG effect.


Assuntos
Algoritmos , Inteligência Artificial , Evolução Molecular , Modelos Genéticos , Filogenia , Análise de Sequência de DNA/métodos , Sequência de Bases , Simulação por Computador , Bases de Dados Genéticas , Cadeias de Markov , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Homologia de Sequência do Ácido Nucleico
17.
IEEE Trans Pattern Anal Mach Intell ; 27(9): 1392-416, 2005 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16173184

RESUMO

Research into methods for reasoning under uncertainty is currently one of the most exciting areas of artificial intelligence, largely because it has recently become possible to record, store, and process large amounts of data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition, face detection, speaker identification, and prediction of gene function, it is even more exciting that researchers are on the verge of introducing systems that can perform large-scale combinatorial analyses of data, decomposing the data into interacting components. For example, computational methods for automatic scene analysis are now emerging in the computer vision community. These methods decompose an input image into its constituent objects, lighting conditions, motion patterns, etc. Two of the main challenges are finding effective representations and models in specific applications and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graph-based probability models and their associated inference and learning algorithms. We review exact techniques and various approximate, computationally efficient techniques, including iterated conditional modes, the expectation maximization (EM) algorithm, Gibbs sampling, the mean field method, variational techniques, structured variational techniques and the sum-product algorithm ("loopy" belief propagation). We describe how each technique can be applied in a vision model of multiple, occluding objects and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.


Assuntos
Algoritmos , Inteligência Artificial , Gráficos por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Modelos Biológicos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
18.
IEEE Trans Pattern Anal Mach Intell ; 37(12): 2374-87, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26539844

RESUMO

In recent scene recognition research images or large image regions are often represented as disorganized "bags" of features which can then be analyzed using models originally developed to capture co-variation of word counts in text. However, image feature counts are likely to be constrained in different ways than word counts in text. For example, as a camera pans upwards from a building entrance over its first few floors and then further up into the sky Fig. 1 Fig. 1. Feature counts change slightly as the field of view moves. For example, the abundance of the "car" features is reduced, but the counts of the features found on building facades are increased. The counting grid model accounts for such changes naturally, and it can also account for images of different scenes.

19.
Science ; 347(6218): 1254806, 2015 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-25525159

RESUMO

To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.


Assuntos
Inteligência Artificial , Transtornos Globais do Desenvolvimento Infantil/genética , Neoplasias Colorretais Hereditárias sem Polipose/genética , Estudo de Associação Genômica Ampla/métodos , Anotação de Sequência Molecular/métodos , Atrofia Muscular Espinal/genética , Splicing de RNA/genética , Proteínas Adaptadoras de Transdução de Sinal/genética , Simulação por Computador , DNA/genética , Éxons/genética , Código Genético , Marcadores Genéticos , Variação Genética , Humanos , Íntrons/genética , Modelos Genéticos , Proteína 1 Homóloga a MutL , Mutação de Sentido Incorreto , Proteínas Nucleares/genética , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sítios de Splice de RNA/genética , Proteínas de Ligação a RNA/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA