Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

A crowdsourcing open platform for literature curation in UniProt.

Wang, Yuqi; Wang, Qinghua; Huang, Hongzhan; Huang, Wei; Chen, Yongxing; McGarvey, Peter B; Wu, Cathy H; Arighi, Cecilia N.

PLoS Biol ; 19(12): e3001464, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34871295

RESUMO

The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.

Assuntos

Crowdsourcing/métodos , Curadoria de Dados/métodos , Anotação de Sequência Molecular/métodos , Sequência de Aminoácidos/genética , Biologia Computacional/métodos , Bases de Dados de Proteínas/tendências , Humanos , Literatura , Proteínas/metabolismo , Participação dos Interessados

2.

UniProt genomic mapping for deciphering functional effects of missense variants.

McGarvey, Peter B; Nightingale, Andrew; Luo, Jie; Huang, Hongzhan; Martin, Maria J; Wu, Cathy.

Hum Mutat ; 40(6): 694-705, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-30840782

RESUMO

Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated single nucleotide polymorphism (SNP) data show that 32% of UniProtKB variants colocate with 8% of ClinVar SNPs. The majority of colocated UniProtKB disease-associated variants (86%) map to 'pathogenic' ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein, and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.

Assuntos

Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Mutação de Sentido Incorreto , Proteínas/química , Sítios de Ligação , Bases de Dados de Proteínas , Predisposição Genética para Doença , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Ligação Proteica , Proteínas/genética , Proteínas/metabolismo , Software , Navegador

3.

ClinGen Allele Registry links information about genetic variants.

Pawliczek, Piotr; Patel, Ronak Y; Ashmore, Lillian R; Jackson, Andrew R; Bizon, Chris; Nelson, Tristan; Powell, Bradford; Freimuth, Robert R; Strande, Natasha; Shah, Neethu; Paithankar, Sameer; Wright, Matt W; Dwight, Selina; Zhen, Jimmy; Landrum, Melissa; McGarvey, Peter; Babb, Larry; Plon, Sharon E; Milosavljevic, Aleksandar.

Hum Mutat ; 39(11): 1690-1701, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30311374

RESUMO

Effective exchange of information about genetic variants is currently hampered by the lack of readily available globally unique variant identifiers that would enable aggregation of information from different sources. The ClinGen Allele Registry addresses this problem by providing (1) globally unique "canonical" variant identifiers (CAids) on demand, either individually or in large batches; (2) access to variant-identifying information in a searchable Registry; (3) links to allele-related records in many commonly used databases; and (4) services for adding links to information about registered variants in external sources. A core element of the Registry is a canonicalization service, implemented using in-memory sequence alignment-based index, which groups variant identifiers denoting the same nucleotide variant and assigns unique and dereferenceable CAids. More than 650 million distinct variants are currently registered, including those from gnomAD, ExAC, dbSNP, and ClinVar, including a small number of variants registered by Registry users. The Registry is accessible both via a web interface and programmatically via well-documented Hypertext Transfer Protocol (HTTP) Representational State Transfer Application Programming Interface (REST-APIs). For programmatic interoperability, the Registry content is accessible in the JavaScript Object Notation for Linked Data (JSON-LD) format. We present several use cases and demonstrate how the linked information may provide raw material for reasoning about variant's pathogenicity.

Assuntos

Bases de Dados Genéticas , Variação Genética/genética , Alelos , Humanos , Sistema de Registros , Software

4.

Computational clustering for viral reference proteomes.

Chen, Chuming; Huang, Hongzhan; Mazumder, Raja; Natale, Darren A; McGarvey, Peter B; Zhang, Jian; Polson, Shawn W; Wang, Yuqi; Wu, Cathy H.

Bioinformatics ; 32(13): 2041-3, 2016 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-27153712

RESUMO

MOTIVATION: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. RESULTS: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt's curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs. AVAILABILITY AND IMPLEMENTATION: http://proteininformationresource.org/rps/viruses/ CONTACT: chenc@udel.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Bases de Dados de Proteínas , Proteoma/análise , Proteínas Virais/análise , Sequência de Aminoácidos , Análise por Conglomerados , Biologia Computacional , Bases de Conhecimento

5.

UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.

Suzek, Baris E; Wang, Yuqi; Huang, Hongzhan; McGarvey, Peter B; Wu, Cathy H.

Bioinformatics ; 31(6): 926-32, 2015 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-25398609

RESUMO

MOTIVATION: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. RESULTS: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (â¼7 times shorter hit list before expansion), faster (â¼6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation.

Assuntos

Biologia Computacional , Bases de Dados de Proteínas , Dioxigenases/metabolismo , Proteínas de Membrana/metabolismo , Proteínas/metabolismo , Análise de Sequência de Proteína , Software , Homólogo AlkB 5 da RNA Desmetilase , Análise por Conglomerados , Dioxigenases/química , Dioxigenases/genética , Ontologia Genética , Humanos , Armazenamento e Recuperação da Informação , Proteínas de Membrana/química , Proteínas de Membrana/genética , Anotação de Sequência Molecular , Proteínas/química , Proteínas/genética

6.

The CPTAC Data Portal: A Resource for Cancer Proteomics Research.

Edwards, Nathan J; Oberti, Mauricio; Thangudu, Ratna R; Cai, Shuang; McGarvey, Peter B; Jacob, Shine; Madhavan, Subha; Ketchum, Karen A.

J Proteome Res ; 14(6): 2707-13, 2015 Jun 05.

Artigo em Inglês | MEDLINE | ID: mdl-25873244

RESUMO

The Clinical Proteomic Tumor Analysis Consortium (CPTAC), under the auspices of the National Cancer Institute's Office of Cancer Clinical Proteomics Research, is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to clinical tumor samples with characterized genomic and transcript profiles. The consortium analyzes cancer biospecimens using mass spectrometry, identifying and quantifying the constituent proteins and characterizing each tumor sample's proteome. Mass spectrometry enables highly specific identification of proteins and their isoforms, accurate relative quantitation of protein abundance in contrasting biospecimens, and localization of post-translational protein modifications, such as phosphorylation, on a protein's sequence. The combination of proteomics, transcriptomics, and genomics data from the same clinical tumor samples provides an unprecedented opportunity for tumor proteogenomics. The CPTAC Data Portal is the centralized data repository for the dissemination of proteomic data collected by Proteome Characterization Centers (PCCs) in the consortium. The portal currently hosts 6.3 TB of data and includes proteomic investigations of breast, colorectal, and ovarian tumor tissues from The Cancer Genome Atlas (TCGA). The data collected by the consortium is made freely available to the public through the data portal.

Assuntos

Pesquisa Biomédica , Bases de Dados de Proteínas , Proteínas de Neoplasias , Proteômica , Humanos , Armazenamento e Recuperação da Informação , Proteínas de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/metabolismo

7.

In silico analysis of autoimmune diseases and genetic relationships to vaccination against infectious diseases.

McGarvey, Peter B; Suzek, Baris E; Baraniuk, James N; Rao, Shruti; Conkright, Brian; Lababidi, Samir; Sutherland, Andrea; Forshee, Richard; Madhavan, Subha.

BMC Immunol ; 15: 61, 2014 Dec 09.

Artigo em Inglês | MEDLINE | ID: mdl-25486901

RESUMO

BACKGROUND: Near universal administration of vaccines mandates intense pharmacovigilance for vaccine safety and a stringently low tolerance for adverse events. Reports of autoimmune diseases (AID) following vaccination have been challenging to evaluate given the high rates of vaccination, background incidence of autoimmunity, and low incidence and variable times for onset of AID after vaccinations. In order to identify biologically plausible pathways to adverse autoimmune events of vaccine-related AID, we used a systems biology approach to create a matrix of innate and adaptive immune mechanisms active in specific diseases, responses to vaccine antigens, adjuvants, preservatives and stabilizers, for the most common vaccine-associated AID found in the Vaccine Adverse Event Reporting System. RESULTS: This report focuses on Guillain-Barre Syndrome (GBS), Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE), and Idiopathic (or immune) Thrombocytopenic Purpura (ITP). Multiple curated databases and automated text mining of PubMed literature identified 667 genes associated with RA, 448 with SLE, 49 with ITP and 73 with GBS. While all data sources provided valuable and unique gene associations, text mining using natural language processing (NLP) algorithms provided the most information but required curation to remove incorrect associations. Six genes were associated with all four AIDs. Thirty-three pathways were shared by the four AIDs. Classification of genes into twelve immune system related categories identified more "Th17 T-cell subtype" genes in RA than the other AIDs, and more "Chemokine plus Receptors" genes associated with RA than SLE. Gene networks were visualized and clustered into interconnected modules with specific gene clusters for each AID, including one in RA with ten C-X-C motif chemokines. The intersection of genes associated with GBS, GBS peptide auto-antigens, influenza A infection, and influenza vaccination created a subnetwork of genes that inferred a possible role for the MAPK signaling pathway in influenza vaccine related GBS. CONCLUSIONS: Results showing unique and common gene sets, pathways, immune system categories and functional clusters of genes in four autoimmune diseases suggest it is possible to develop molecular classifications of autoimmune and inflammatory events. Combining this information with cellular and other disease responses should greatly aid in the assessment of potential immune-mediated adverse events following vaccination.

Assuntos

Doenças Autoimunes , Simulação por Computador , Controle de Infecções , Infecções/imunologia , Modelos Imunológicos , Vacinação , Vacinas , Imunidade Adaptativa , Doenças Autoimunes/genética , Doenças Autoimunes/imunologia , Doenças Autoimunes/patologia , Humanos , Infecções/genética , Infecções/patologia , Vacinas/efeitos adversos , Vacinas/imunologia

8.

NCI's Proteomic Data Commons: A Cloud-Based Proteomics Repository Empowering Comprehensive Cancer Analysis through Cross-Referencing with Genomic and Imaging Data.

Thangudu, Ratna R; Holck, Michael; Singhal, Deepak; Pilozzi, Alexander; Edwards, Nathan; Rudnick, Paul A; Domagalski, Marcin J; Chilappagari, Padmini; Ma, Lei; Xin, Yi; Le, Toan; Nyce, Kristen; Chaudhary, Rekha; Ketchum, Karen A; Maurais, Aaron; Connolly, Brian; Riffle, Michael; Chambers, Matthew C; MacLean, Brendan; MacCoss, Michael J; McGarvey, Peter B; Basu, Anand; Otridge, John; Casas-Silva, Esmeralda; Venkatachari, Sudha; Rodriguez, Henry; Zhang, Xu.

Cancer Res Commun ; 4(9): 2480-2488, 2024 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-39225545

RESUMO

Proteomics has emerged as a powerful tool for studying cancer biology, developing diagnostics, and therapies. With the continuous improvement and widespread availability of high-throughput proteomic technologies, the generation of large-scale proteomic data has become more common in cancer research, and there is a growing need for resources that support the sharing and integration of multi-omics datasets. Such datasets require extensive metadata including clinical, biospecimen, and experimental and workflow annotations that are crucial for data interpretation and reanalysis. The need to integrate, analyze, and share these data has led to the development of NCI's Proteomic Data Commons (PDC), accessible at https://pdc.cancer.gov. As a specialized repository within the NCI Cancer Research Data Commons (CRDC), PDC enables researchers to locate and analyze proteomic data from various cancer types and connect with genomic and imaging data available for the same samples in other CRDC nodes. Presently, PDC houses annotated data from more than 160 datasets across 19 cancer types, generated by several large-scale cancer research programs with cohort sizes exceeding 100 samples (tumor and associated normal when available). In this article, we review the current state of PDC in cancer research, discuss the opportunities and challenges associated with data sharing in proteomics, and propose future directions for the resource. SIGNIFICANCE: The Proteomic Data Commons (PDC) plays a crucial role in advancing cancer research by providing a centralized repository of high-quality cancer proteomic data, enriched with extensive clinical annotations. By integrating and cross-referencing with complementary genomic and imaging data, the PDC facilitates multi-omics analyses, driving comprehensive insights, and accelerating discoveries across various cancer types.

Assuntos

Computação em Nuvem , Genômica , National Cancer Institute (U.S.) , Neoplasias , Proteômica , Humanos , Proteômica/métodos , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/diagnóstico , Genômica/métodos , Estados Unidos

9.

A comprehensive protein-centric ID mapping service for molecular data integration.

Huang, Hongzhan; McGarvey, Peter B; Suzek, Baris E; Mazumder, Raja; Zhang, Jian; Chen, Yongxing; Wu, Cathy H.

Bioinformatics ; 27(8): 1190-1, 2011 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-21478197

RESUMO

MOTIVATION: Identifier (ID) mapping establishes links between various biological databases and is an essential first step for molecular data integration and functional annotation. ID mapping allows diverse molecular data on genes and proteins to be combined and mapped to functional pathways and ontologies. We have developed comprehensive protein-centric ID mapping services providing mappings for 90 IDs derived from databases on genes, proteins, pathways, diseases, structures, protein families, protein interaction, literature, ontologies, etc. The services are widely used and have been regularly updated since 2006. AVAILABILITY: www.uniprot.org/mappingandproteininformation-resource.org/pirwww/search/idmapping.shtml CONTACT: huang@dbi.udel.edu.

Assuntos

Bases de Dados de Proteínas , Proteínas/química , Proteínas/genética , Software , Internet

10.

De novo assembly and annotation of transcriptomes from two cultivars of Cannabis sativa with different cannabinoid profiles.

McGarvey, Peter; Huang, Jiahao; McCoy, Matthew; Orvis, Joshua; Katsir, Yael; Lotringer, Nitzan; Nesher, Iris; Kavarana, Malcolm; Sun, Mingyang; Peet, Richard; Meiri, David; Madhavan, Subha.

Gene ; 762: 145026, 2020 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-32781193

RESUMO

Cannabis has been cultivated for millennia for medicinal, industrial and recreational uses. Our long-term goal is to compare the transcriptomes of cultivars with different cannabinoid profiles for therapeutic purposes. Here we describe the de novo assembly, annotation and initial analysis of two cultivars of Cannabis, a high THC variety and a CBD plus THC variety. Cultivars were grown under different lighting conditions; flower buds were sampled over 71 days. Cannabinoid profiles were determined by ESI-LC/MS. RNA samples were sequenced using the HiSeq4000 platform. Transcriptomes were assembled using the DRAP pipeline and annotated using the BLAST2GO pipeline and other tools. Each transcriptome contained over twenty thousand protein encoding transcripts with ORFs and flanking sequence. Identification of transcripts for cannabinoid pathway and related enzymes showed full-length ORFs that align with the draft genomes of the Purple Kush and Finola cultivars. Two transcripts were found for olivetolic acid cyclase (OAC) that mapped to distinct locations on the Purple Kush genome suggesting multiple genes for OAC are expressed in some cultivars. The ability to make high quality annotated reference transcriptomes in Cannabis or other plants can promote rapid comparative analysis between cultivars and growth conditions in Cannabis and other organisms without annotated genome assemblies.

Assuntos

Canabinoides/biossíntese , Cannabis/genética , Transcriptoma , Cannabis/classificação , Cannabis/metabolismo , Transferases Intramoleculares/genética , Transferases Intramoleculares/metabolismo , Anotação de Sequência Molecular , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo

11.

Infrastructure for the life sciences: design and implementation of the UniProt website.

Jain, Eric; Bairoch, Amos; Duvaud, Severine; Phan, Isabelle; Redaschi, Nicole; Suzek, Baris E; Martin, Maria J; McGarvey, Peter; Gasteiger, Elisabeth.

BMC Bioinformatics ; 10: 136, 2009 May 08.

Artigo em Inglês | MEDLINE | ID: mdl-19426475

RESUMO

BACKGROUND: The UniProt consortium was formed in 2002 by groups from the Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) at Georgetown University, and soon afterwards the website http://www.uniprot.org was set up as a central entry point to UniProt resources. Requests to this address were redirected to one of the three organisations' websites. While these sites shared a set of static pages with general information about UniProt, their pages for searching and viewing data were different. To provide users with a consistent view and to cut the cost of maintaining three separate sites, the consortium decided to develop a common website for UniProt. Following several years of intense development and a year of public beta testing, the http://www.uniprot.org domain was switched to the newly developed site described in this paper in July 2008. DESCRIPTION: The UniProt consortium is the main provider of protein sequence and annotation data for much of the life sciences community. The http://www.uniprot.org website is the primary access point to this data and to documentation and basic tools for the data. These tools include full text and field-based text search, similarity search, multiple sequence alignment, batch retrieval and database identifier mapping. This paper discusses the design and implementation of the new website, which was released in July 2008, and shows how it improves data access for users with different levels of experience, as well as to machines for programmatic access.http://www.uniprot.org/ is open for both academic and commercial use. The site was built with open source tools and libraries. Feedback is very welcome and should be sent to help@uniprot.org. CONCLUSION: The new UniProt website makes accessing and understanding UniProt easier than ever. The two main lessons learned are that getting the basics right for such a data provider website has huge benefits, but is not trivial and easy to underestimate, and that there is no substitute for using empirical data throughout the development process to decide on what is and what is not working for your users.

Assuntos

Bases de Dados de Proteínas , Análise de Sequência de Proteína , Armazenamento e Recuperação da Informação/métodos , Internet , Proteínas/química , Interface Usuário-Computador

12.

UniRef: comprehensive and non-redundant UniProt reference clusters.

Suzek, Baris E; Huang, Hongzhan; McGarvey, Peter; Mazumder, Raja; Wu, Cathy H.

Bioinformatics ; 23(10): 1282-8, 2007 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-17379688

RESUMO

MOTIVATION: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. RESULTS: The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. AVAILABILITY: UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional , Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , Animais , Humanos , Armazenamento e Recuperação da Informação

13.

ClinGen Cancer Somatic Working Group - standardizing and democratizing access to cancer molecular diagnostic data to drive translational research.

Madhavan, Subha; Ritter, Deborah; Micheel, Christine; Rao, Shruti; Roy, Angshumoy; Sonkin, Dmitriy; Mccoy, Matthew; Griffith, Malachi; Griffith, Obi L; Mcgarvey, Peter; Kulkarni, Shashikant.

Pac Symp Biocomput ; 23: 247-258, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29218886

RESUMO

A growing number of academic and community clinics are conducting genomic testing to inform treatment decisions for cancer patients (1). In the last 3-5 years, there has been a rapid increase in clinical use of next generation sequencing (NGS) based cancer molecular diagnostic (MolDx) testing (2). The increasing availability and decreasing cost of tumor genomic profiling means that physicians can now make treatment decisions armed with patient-specific genetic information. Accumulating research in the cancer biology field indicates that there is significant potential to improve cancer patient outcomes by effectively leveraging this rich source of genomic data in treatment planning (3). To achieve truly personalized medicine in oncology, it is critical to catalog cancer sequence variants from MolDx testing for their clinical relevance along with treatment information and patient outcomes, and to do so in a way that supports large-scale data aggregation and new hypothesis generation. One critical challenge to encoding variant data is adopting a standard of annotation of those variants that are clinically actionable. Through the NIH-funded Clinical Genome Resource (ClinGen) (4), in collaboration with NLM's ClinVar database and >50 academic and industry based cancer research organizations, we developed the Minimal Variant Level Data (MVLD) framework to standardize reporting and interpretation of drug associated alterations (5). We are currently involved in collaborative efforts to align the MVLD framework with parallel, complementary sequence variants interpretation clinical guidelines from the Association of Molecular Pathologists (AMP) for clinical labs (6). In order to truly democratize access to MolDx data for care and research needs, these standards must be harmonized to support sharing of clinical cancer variants. Here we describe the processes and methods developed within the ClinGen's Somatic WG in collaboration with over 60 cancer care and research organizations as well as CLIA-certified, CAP-accredited clinical testing labs to develop standards for cancer variant interpretation and sharing.

Assuntos

Técnicas de Diagnóstico Molecular/estatística & dados numéricos , Neoplasias/diagnóstico , Neoplasias/genética , Acesso à Informação , Carcinoma Ductal Pancreático/diagnóstico , Carcinoma Ductal Pancreático/genética , Criança , Biologia Computacional/métodos , Bases de Dados Genéticas/estatística & dados numéricos , Perfilação da Expressão Gênica/estatística & dados numéricos , Genes p53 , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Técnicas de Diagnóstico Molecular/normas , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/genética , Medicina de Precisão , Pesquisa Translacional Biomédica/normas , Pesquisa Translacional Biomédica/estatística & dados numéricos

14.

Standardizing And Democratizing Access To Cancer Molecular Diagnostic Test Data From Patients To Drive Translational Research.

Madhavan, Subha; Ritter, Deborah; Micheel, Christine; Rao, Shruti; Roy, Angshumoy; Sonkin, Dmitriy; Mccoy, Matthew; Griffith, Malachi; Griffith, Obi L; Mcgarvey, Peter; Kulkarni, Shashikant.

AMIA Jt Summits Transl Sci Proc ; 2017: 152-159, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29888062

RESUMO

In the last 3-5 years, there has been a rapid increase in clinical use of next generation sequencing (NGS) based cancer molecular diagnostic (MolDx) testing to develop better treatment plans with targeted therapies. To truly achieve precision oncology, it is critical to catalog cancer sequence variants from MolDx testing for their clinical relevance along with treatment information and patient outcomes, and to do so in a way that supports large-scale data aggregation and new hypothesis generation. Through the NIH-funded Clinical Genome Resource (ClinGen), in collaboration with NLM's ClinVar database and >50 academic and industry based cancer research organizations, a Minimal Variant Level Data (MVLD) framework to standardize reporting and interpretation of drug associated alterations was developed. Methodological and technology development to standardize and map MolDx data to the MVLD standard are presented here. Also described is a novel community engagement effort through disease-focused taskforces to provide usecases for technology development.

15.

iTextMine: integrated text-mining system for large-scale knowledge extraction from the literature.

Ren, Jia; Li, Gang; Ross, Karen; Arighi, Cecilia; McGarvey, Peter; Rao, Shruti; Cowart, Julie; Madhavan, Subha; Vijay-Shanker, K; Wu, Cathy H.

Database (Oxford) ; 20182018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30576489

RESUMO

Numerous efforts have been made for developing text-mining tools to extract information from biomedical text automatically. They have assisted in many biological tasks, such as database curation and hypothesis generation. Text-mining tools are usually different from each other in terms of programming language, system dependency and input/output format. There are few previous works that concern the integration of different text-mining tools and their results from large-scale text processing. In this paper, we describe the iTextMine system with an automated workflow to run multiple text-mining tools on large-scale text for knowledge extraction. We employ parallel processing with dockerized text-mining tools with a standardized JSON output format and implement a text alignment algorithm to solve the text discrepancy for result integration. iTextMine presently integrates four relation extraction tools, which have been used to process all the Medline abstracts and PMC open access full-length articles. The website allows users to browse the text evidence and view integrated results for knowledge discovery through a network view. We demonstrate the utilities of iTextMine with two use cases involving the gene PTEN and breast cancer and the gene SATB1.

Assuntos

Indexação e Redação de Resumos/métodos , Mineração de Dados/métodos , Publicações , Software , Algoritmos

16.

Eye-Tracking Study to Enhance Usability of Molecular Diagnostics Reports in Cancer Precision Medicine.

Sharma, Vishakha; Fong, Allan; Beckman, Robert A; Rao, Shruti; Boca, Simina M; McGarvey, Peter B; Ratwani, Raj M; Madhavan, Subha.

JCO Precis Oncol ; 2: 1-11, 2018 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-35135129

RESUMO

PURPOSE: We conducted usability studies on commercially available molecular diagnostic (MDX) test reports to identify strengths and weaknesses in content and form that drive clinical decision making. Given routine genomic testing in cancer medicine, oncologists must interpret MDX reports as well as evidence concerning clinical utility of biomarkers accurately for treatment or trial selection. This work aims to evaluate effectiveness of MDX reports in facilitating cancer treatment planning. METHODS: Fourteen clinicians at an academic tertiary care medical facility, with a wide range of experience in oncology and in the use of molecular testing, participated in this study. Three commercially available, widely used, Clinical Laboratory Improvement Amendments (CLIA)-certified, College of American Pathologists (CAP)-accredited test reports (labeled Laboratories A, B, and C) were used. Eye tracking, surveys, and think-aloud protocols were used to collect usability data for these MDX reports focusing on ease of comprehension and actionability. RESULTS: Clinicians found two primary areas in molecular diagnostic reports most useful for patient care: therapy options with benefit or lack of benefit to patients, including enrolling clinical trials; and pathogenic tumor molecular anomalies detected. Therapeutic implications and therapy classes such as US Food and Drug Administration-approved off-label, on-label, clinical trials were critical for decision making. However, all reports had usability and comprehension issues in these areas and could be improved. CONCLUSION: Focused usability studies can help drive our understanding of the clinical workflow for use of molecular diagnostic tests in cancer care. This in turn can have major effects on quality of care, outcomes, costs, and patient satisfaction. This study demonstrates the use of specific usability techniques (eye tracking and think-aloud protocols) to help clinical laboratories improve MDX report design in a precision oncology treatment setting.

17.

eGARD: Extracting associations between genomic anomalies and drug responses from text.

Mahmood, A S M Ashique; Rao, Shruti; McGarvey, Peter; Wu, Cathy; Madhavan, Subha; Vijay-Shanker, K.

PLoS One ; 12(12): e0189663, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29261751

RESUMO

Tumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for 'best-fit' therapies and readily generate hypotheses for new clinical trials.

Assuntos

Genômica , Processamento de Linguagem Natural , Neoplasias/genética , Mineração de Dados , Humanos

18.

Distinct lymphocyte antigens 6 (Ly6) family members Ly6D, Ly6E, Ly6K and Ly6H drive tumorigenesis and clinical outcome.

Luo, Linlin; McGarvey, Peter; Madhavan, Subha; Kumar, Rakesh; Gusev, Yuriy; Upadhyay, Geeta.

Oncotarget ; 7(10): 11165-93, 2016 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-26862846

RESUMO

Stem cell antigen-1 (Sca-1) is used to isolate and characterize tumor initiating cell populations from tumors of various murine models [1]. Sca-1 induced disruption of TGF-ß signaling is required in vivo tumorigenesis in breast cancer models [2, 3-5]. The role of human Ly6 gene family is only beginning to be appreciated in recent literature [6-9]. To study the significance of Ly6 gene family members, we have visualized one hundred thirty gene expression omnibus (GEO) dataset using Oncomine (Invitrogen) and Georgetown Database of Cancer (G-DOC). This analysis showed that four different members Ly6D, Ly6E, Ly6H or Ly6K have increased gene expressed in bladder, brain and CNS, breast, colorectal, cervical, ovarian, lung, head and neck, pancreatic and prostate cancer than their normal counter part tissues. Increased expression of Ly6D, Ly6E, Ly6H or Ly6K was observed in sub-set of cancer type. The increased expression of Ly6D, Ly6E, Ly6H and Ly6K was found to be associated with poor outcome in ovarian, colorectal, gastric, breast, lung, bladder or brain and CNS as observed by KM plotter and PROGgeneV2 platform. The remarkable findings of increased expression of Ly6 family members and its positive correlation with poor outcome on patient survival in multiple cancer type indicate that Ly6 family members Ly6D, Ly6E, Ly6K and Ly6H will be an important targets in clinical practice as marker of poor prognosis and for developing novel therapeutics in multiple cancer type.

Assuntos

Antígenos Ly , Transformação Celular Neoplásica , Conjuntos de Dados como Assunto , Humanos , Neoplasias

19.

Protein networks in induced sputum from smokers and COPD patients.

Baraniuk, James N; Casado, Begona; Pannell, Lewis K; McGarvey, Peter B; Boschetto, Piera; Luisetti, Maurizio; Iadarola, Paolo.

Int J Chron Obstruct Pulmon Dis ; 10: 1957-75, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26396508

RESUMO

RATIONALE: Subtypes of cigarette smoke-induced disease affect different lung structures and may have distinct pathophysiological mechanisms. OBJECTIVE: To determine if proteomic classification of the cellular and vascular origins of sputum proteins can characterize these mechanisms and phenotypes. SUBJECTS AND METHODS: Individual sputum specimens from lifelong nonsmokers (n=7) and smokers with normal lung function (n=13), mucous hypersecretion with normal lung function (n=11), obstructed airflow without emphysema (n=15), and obstruction plus emphysema (n=10) were assessed with mass spectrometry. Data reduction, logarithmic transformation of spectral counts, and Cytoscape network-interaction analysis were performed. The original 203 proteins were reduced to the most informative 50. Sources were secretory dimeric IgA, submucosal gland serous and mucous cells, goblet and other epithelial cells, and vascular permeability. RESULTS: Epithelial proteins discriminated nonsmokers from smokers. Mucin 5AC was elevated in healthy smokers and chronic bronchitis, suggesting a continuum with the severity of hypersecretion determined by mechanisms of goblet-cell hyperplasia. Obstructed airflow was correlated with glandular proteins and lower levels of Ig joining chain compared to other groups. Emphysema subjects' sputum was unique, with high plasma proteins and components of neutrophil extracellular traps, such as histones and defensins. In contrast, defensins were correlated with epithelial proteins in all other groups. Protein-network interactions were unique to each group. CONCLUSION: The proteomes were interpreted as complex "biosignatures" that suggest distinct pathophysiological mechanisms for mucin 5AC hypersecretion, airflow obstruction, and inflammatory emphysema phenotypes. Proteomic phenotyping may improve genotyping studies by selecting more homogeneous study groups. Each phenotype may require its own mechanistically based diagnostic, risk-assessment, drug- and other treatment algorithms.

Assuntos

Bronquite Crônica/metabolismo , Mucina-5AC/metabolismo , Doença Pulmonar Obstrutiva Crônica/fisiopatologia , Enfisema Pulmonar/metabolismo , Fumar/metabolismo , Escarro/metabolismo , Adulto , Idoso , Feminino , Volume Expiratório Forçado , Humanos , Imunoglobulina A Secretora/sangue , Masculino , Pessoa de Meia-Idade , Muco/metabolismo , Proteômica

20.

Future of Evidence Synthesis in Precision Oncology: Between Systematic Reviews and Biocuration.

Boca, Simina M; Panagiotou, Orestis A; Rao, Shruti; McGarvey, Peter B; Madhavan, Subha.

JCO Precis Oncol ; 22018.

Artigo em Inglês | MEDLINE | ID: mdl-31930186

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA