Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Proc Natl Acad Sci U S A ; 112(52): 15976-81, 2015 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-26598659

RESUMO

Horizontal gene transfer (HGT), or the transfer of genes between species, has been recognized recently as more pervasive than previously suspected. Here, we report evidence for an unprecedented degree of HGT into an animal genome, based on a draft genome of a tardigrade, Hypsibius dujardini. Tardigrades are microscopic eight-legged animals that are famous for their ability to survive extreme conditions. Genome sequencing, direct confirmation of physical linkage, and phylogenetic analysis revealed that a large fraction of the H. dujardini genome is derived from diverse bacteria as well as plants, fungi, and Archaea. We estimate that approximately one-sixth of tardigrade genes entered by HGT, nearly double the fraction found in the most extreme cases of HGT into animals known to date. Foreign genes have supplemented, expanded, and even replaced some metazoan gene families within the tardigrade genome. Our results demonstrate that an unexpectedly large fraction of an animal genome can be derived from foreign sources. We speculate that animals that can survive extremes may be particularly prone to acquiring foreign genes.


Assuntos
Transferência Genética Horizontal , Genoma/genética , Biblioteca Genômica , Análise de Sequência de DNA/métodos , Tardígrados/genética , Animais , DNA Arqueal/química , DNA Arqueal/genética , DNA Bacteriano/química , DNA Bacteriano/genética , DNA Fúngico/química , DNA Fúngico/genética , DNA de Plantas/química , DNA de Plantas/genética , DNA Viral/química , DNA Viral/genética , Filogenia , Tardígrados/classificação
2.
Brief Bioinform ; 12(5): 485-8, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21666252

RESUMO

There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence records-the input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e.g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.


Assuntos
Bases de Dados Factuais , Internet , Proteoma/análise , Software , Anotação de Sequência Molecular , Proteoma/normas , Análise de Sequência
3.
J Cancer Res Clin Oncol ; 149(15): 14125-14136, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37552307

RESUMO

PURPOSE: Anti-PD-1 therapy provides clinical benefit in 40-50% of patients with relapsed and/or metastatic head and neck squamous cell carcinoma (RM-HNSCC). Selection of anti- PD-1 therapy is typically based on patient PD-L1 immunohistochemistry (IHC) which has low specificity for predicting disease control. Therefore, there is a critical need for a clinical biomarker that will predict clinical benefit to anti-PD-1 treatment with high specificity. METHODS: Clinical treatment and outcomes data for 103 RM-HNSCC patients were paired with RNA-sequencing data from formalin-fixed patient samples. Using logistic regression methods, we developed a novel biomarker classifier based on expression patterns in the tumor immune microenvironment to predict disease control with monotherapy PD-1 inhibitors (pembrolizumab and nivolumab). The performance of the biomarker was internally validated using out-of-bag methods. RESULTS: The biomarker significantly predicted disease control (65% in predicted non-progressors vs. 17% in predicted progressors, p < 0.001) and was significantly correlated with overall survival (OS; p = 0.004). In addition, the biomarker outperformed PD-L1 IHC across numerous metrics including sensitivity (0.79 vs 0.64, respectively; p = 0.005) and specificity (0.70 vs 0.61, respectively; p = 0.009). CONCLUSION: This novel assay uses tumor immune microenvironment expression data to predict disease control and OS with high sensitivity and specificity in patients with RM-HNSCC treated with anti-PD-1 monotherapy.

4.
Nucleic Acids Res ; 38(Database issue): D196-203, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19892828

RESUMO

The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Escherichia coli/genética , Células Eucarióticas/química , Proteínas/genética , Algoritmos , Animais , Análise por Conglomerados , Biologia Computacional/tendências , Escherichia coli/metabolismo , Genoma Bacteriano , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Estrutura Terciária de Proteína , Proteômica/métodos , Software
5.
Sci Rep ; 12(1): 1342, 2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-35079117

RESUMO

Anti-PD-1 therapy can provide long, durable benefit to a fraction of patients. The on-label PD-L1 test, however, does not accurately predict response. To build a better biomarker, we created a method called T Cell Subtype Profiling (TCSP) that characterizes the abundance of T cell subtypes (TCSs) in FFPE specimens using five RNA models. These TCS RNA models are created using functional methods, and robustly discriminate between naïve, activated, exhausted, effector memory, and central memory TCSs, without the reliance on non-specific, classical markers. TCSP is analytically valid and corroborates associations between TCSs and clinical outcomes. Multianalyte biomarkers based on TCS estimates predicted response to anti-PD-1 therapy in three different cancers and outperformed the indicated PD-L1 test, as well as Tumor Mutational Burden. Given the utility of TCSP, we investigated the abundance of TCSs in TCGA cancers and created a portal to enable researchers to discover other TCSP-based biomarkers.


Assuntos
Linfócitos T CD8-Positivos/metabolismo , Neoplasias/tratamento farmacológico , Receptor de Morte Celular Programada 1/metabolismo , Biomarcadores Tumorais/metabolismo , Linfócitos T CD8-Positivos/patologia , Células Cultivadas , Humanos , Leucócitos Mononucleares
6.
Bioinformatics ; 25(10): 1333-4, 2009 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-19297349

RESUMO

SUMMARY: The rise in biological sequence data has led to a proliferation of separate, specialized databases. While there is great value in having many independent annotations, it is critical that there be a way to integrate them in one combined view. The Distributed Annotation System (DAS) was developed for that very purpose. There are currently no DAS clients that are open source, specialized for aggregating and comparing protein sequence annotation, and that can run as a self-contained application outside of a web browser. The speed, flexibility and extensibility that come with a stand-alone application motivated us to create DASher, an open-source Java DAS client. Given a UniProt sequence identifier, DASher automatically queries DAS-supporting servers worldwide for any information on that sequence and then displays the annotations in an interactive viewer for easy comparison. DASher is a fast, Java-based DAS client optimized for viewing protein sequence annotation and compliant with the latest DAS protocol specification 1.53E. AVAILABILITY: DASher is available for direct use and download at http://dasher.sbc.su.se including examples and source code under the GPLv3 licence. Java version 6 or higher is required.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Bases de Dados de Proteínas , Linguagens de Programação , Interface Usuário-Computador
7.
J Mol Diagn ; 22(4): 555-570, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32036085

RESUMO

As immuno-oncology drugs grow more popular in the treatment of cancer, better methods are needed to quantify the tumor immune cell component to determine which patients are most likely to benefit from treatment. Methods such as flow cytometry can accurately assess the composition of infiltrating immune cells; however, they show limited use in formalin-fixed, paraffin-embedded (FFPE) specimens. This article describes a novel hybrid-capture RNA sequencing assay, ImmunoPrism, that estimates the relative percentage abundance of eight immune cell types in FFPE solid tumors. Immune health expression models were generated using machine learning methods and used to uniquely identify each immune cell type using the most discriminatively expressed genes. The analytical performance of the assay was assessed using 101 libraries from 40 FFPE and 32 fresh-frozen samples. With defined samples, ImmunoPrism had a precision of ±2.72%, a total error of 2.75%, and a strong correlation (r2 = 0.81; P < 0.001) to flow cytometry. ImmunoPrism had similar performance in dissociated tumor cell samples (total error of 8.12%) and correlated strongly with immunohistochemistry (CD8: r2 = 0.83; P < 0.001) in FFPE samples. Other performance metrics were determined, including limit of detection, reportable range, and reproducibility. The approach used for analytical validation is shared here so that it may serve as a helpful framework for other laboratories when validating future complex RNA-based assays.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Imunomodulação/genética , Neoplasias/genética , Neoplasias/imunologia , Biologia Computacional/normas , Perfilação da Expressão Gênica/normas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Imuno-Histoquímica , Leucócitos Mononucleares/imunologia , Leucócitos Mononucleares/metabolismo , Linfócitos/imunologia , Linfócitos/metabolismo , Neoplasias/metabolismo , Neoplasias/patologia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de RNA
8.
BMC Bioinformatics ; 10: 314, 2009 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-19785723

RESUMO

BACKGROUND: Transmembrane (TM) proteins are proteins that span a biological membrane one or more times. As their 3-D structures are hard to determine, experiments focus on identifying their topology (i. e. which parts of the amino acid sequence are buried in the membrane and which are located on either side of the membrane), but only a few topologies are known. Consequently, various computational TM topology predictors have been developed, but their accuracies are far from perfect. The prediction quality can be improved by applying a consensus approach, which combines results of several predictors to yield a more reliable result. RESULTS: A novel TM consensus method, named MetaTM, is proposed in this work. MetaTM is based on support vector machine models and combines the results of six TM topology predictors and two signal peptide predictors. On a large data set comprising 1460 sequences of TM proteins with known topologies and 2362 globular protein sequences it correctly predicts 86.7% of all topologies. CONCLUSION: Combining several TM predictors in a consensus prediction framework improves overall accuracy compared to any of the individual methods. Our proposed SVM-based system also has higher accuracy than a previous consensus predictor. MetaTM is made available both as downloadable source code and as DAS server at http://MetaTM.sbc.su.se.


Assuntos
Biologia Computacional/métodos , Proteínas de Membrana/química , Software , Algoritmos , Bases de Dados de Proteínas , Conformação Proteica , Análise de Sequência de Proteína/métodos
9.
Sci Rep ; 8(1): 28, 2018 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-29311716

RESUMO

Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.


Assuntos
Genoma Viral , Metagenoma , Metagenômica , Proteínas Virais/genética , Sequência de Bases , Análise por Conglomerados , Biologia Computacional/métodos , Humanos , Cadeias de Markov , Metagenômica/métodos , Anotação de Sequência Molecular , Fases de Leitura Aberta
10.
Plant Genome ; 8(2): eplantgenome2014.08.0040, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33228321

RESUMO

Leaf rust, caused by Puccinia triticina Eriks., is one of the most widespread diseases of wheat and breeding for resistance is one of the most effective methods of control. Lr16 is a wheat leaf rust resistance gene (R-gene) that provides resistance at both the seedling and adult stages. Simple-sequence repeat (SSR) markers have been used to map Lr16 to the distal end of chromosome 2B. The objectives of this study were to use RNA sequencing (RNA-seq) and in silico subtraction to identify new R-gene analogs (RGAs) and use them as Lr16 markers. RNA was isolated from the susceptible wheat cultivar Thatcher (Tc) and the resistant Tc isolines TcLr10, TcLr16, TcLr21, and sequenced using Illumina technology. Using in silico subtraction, sequences from the resistant Tc isolines were aligned to a Tc reference expressed sequence tag (EST) set. Sequences not aligning to the Tc reference were assembled into contigs and analyzed using BLASTx to determine putative gene functions. Primer pairs were designed for 181 RGA sequences, of which, 137 amplified in at least one of the parents. A mapping population was developed with 165 F2 lines from a cross between the rust-susceptible cultivar Chinese Spring (CS) and TcLr16. Two RGA markers XTaLr16_RGA266585 and XTaLr16_RGA22128 were identified that mapped proximally 1.2 and 23.8 cM from Lr16, respectively. Three SSR markers Xwmc764, Xwmc661, and Xbarc35 mapped between these two RGA markers at distances of 5.0, 10.9, and 16.1 cM from Lr16, respectively. In silico subtraction is an effective technique for isolating RGAs linked to R-genes of interest.

11.
Hum Mol Genet ; 14(7): 903-12, 2005 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-15703188

RESUMO

Craniofacial abnormalities are one of the most common birth defects in humans, but little is known about the human genes that control these important developmental processes. To identify relevant genes, we analyzed transcription profiles of human pharyngeal arch 1 (PA1), a conserved embryonic structure that develops into the palate and jaw. Using microdissected, normal human craniofacial structures, we constructed 12 SAGE (serial analysis of gene expression) libraries and sequenced 606 532 tags. We also performed Affymetrix microarray analysis on 25 craniofacial targets. Our data revealed not only genes "enriched" or differentially expressed in PA1 during fourth and fifth week of human development, but also 6927 genes newly identified to be expressed in human PA1. Many of these genes are involved in biosynthetic processes and have binding function and catalytic activity. We compared expression profiles of human genes with those of mouse homologs to look for genes more specific to human craniofacial development and found 766 genes expressed in human PA1, but not in mouse PA1. We also identified 1408 genes that were expressed in mouse as well as human PA1 and could be useful in creating mouse models for human conditions. We confirmed conservation of some human PA1 expression patterns in mouse embryonic samples with whole mount in situ hybridization and real-time RT-PCR. This comprehensive approach to expression profiling gives insights into the early development of the craniofacial region and provides markers for developmental structures and candidate genes, including SET and CCT3, for diseases such as orofacial clefting and micrognathia.


Assuntos
Região Branquial/embriologia , Desenvolvimento Embrionário , Regulação da Expressão Gênica no Desenvolvimento , Animais , Catálise , Anormalidades Craniofaciais/genética , DNA Complementar/metabolismo , Modelos Animais de Doenças , Biblioteca Gênica , Humanos , Hibridização In Situ , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Especificidade da Espécie , Fatores de Tempo
12.
Genome Res ; 14(10B): 2041-7, 2004 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-15489324

RESUMO

Transcription factors (TFs) are essential regulators of gene expression, and mutated TF genes have been shown to cause numerous human genetic diseases. Yet to date, no single, comprehensive database of human TFs exists. In this work, we describe the collection of an essentially complete set of TF genes from one depiction of the human ORFeome, and the design of a microarray to interrogate their expression. Taking 1468 known TFs from TRANSFAC, InterPro, and FlyBase, we used this seed set to search the ScriptSure human transcriptome database for additional genes. ScriptSure's genome-anchored transcript clusters allowed us to work with a nonredundant high-quality representation of the human transcriptome. We used a high-stringency similarity search by using BLASTN, and a protein motif search of the human ORFeome by using hidden Markov models of DNA-binding domains known to occur exclusively or primarily in TFs. Four hundred ninety-four additional TF genes were identified in the overlap between the two searches, bringing our estimate of the total number of human TFs to 1962. Zinc finger genes are by far the most abundant family (762 members), followed by homeobox (199 members) and basic helix-loop-helix genes (117 members). We designed a microarray of 50-mer oligonucleotide probes targeted to a unique region of the coding sequence of each gene. We have successfully used this microarray to interrogate TF gene expression in species as diverse as chickens and mice, as well as in humans.


Assuntos
Perfilação da Expressão Gênica , Genoma Humano , Análise de Sequência com Séries de Oligonucleotídeos , Fases de Leitura Aberta/genética , Fatores de Transcrição/química , Fatores de Transcrição/genética , Humanos , Cadeias de Markov , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA