Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Viruses ; 14(8)2022 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-36016459

RESUMO

Epitopes are short amino acid sequences that define the antigen signature to which an antibody or T cell receptor binds. In light of the current pandemic, epitope analysis and prediction are paramount to improving serological testing and developing vaccines. In this paper, known epitope sequences from SARS-CoV, SARS-CoV-2, and other Coronaviridae were leveraged to identify additional antigen regions in 62K SARS-CoV-2 genomes. Additionally, we present epitope distribution across SARS-CoV-2 genomes, locate the most commonly found epitopes, and discuss where epitopes are located on proteins and how epitopes can be grouped into classes. The mutation density of different protein regions is presented using a big data approach. It was observed that there are 112 B cell and 279 T cell conserved epitopes between SARS-CoV-2 and SARS-CoV, with more diverse sequences found in Nucleoprotein and Spike glycoprotein.


Assuntos
COVID-19 , Vacinas Virais , Vacinas contra COVID-19 , Epitopos de Linfócito B , Epitopos de Linfócito T , Humanos , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus
2.
Artigo em Inglês | MEDLINE | ID: mdl-32877338

RESUMO

The rapid growth in biological sequence data is revolutionizing our understanding of genotypic diversity and challenging conventional approaches to informatics. With the increasing availability of genomic data, traditional bioinformatic tools require substantial computational time and the creation of ever-larger indices each time a researcher seeks to gain insight from the data. To address these challenges, we pre-computed important relationships between biological entities spanning the Central Dogma of Molecular Biology and captured this information in a relational database. The database can be queried across hundreds of millions of entities and returns results in a fraction of the time required by traditional methods. In this paper, we describe Functional Genomics Platform (formerly known as OMXWare), a comprehensive database relating genotype to phenotype for bacterial life. Continually updated, the Functional Genomics Platform today contains data derived from 200,000 curated, self-consistently assembled genomes. The database stores functional data for over 68 million genes, 52 million proteins, and 239 million domains with associated biological activity annotations from Gene Ontology, KEGG, MetaCyc, and Reactome. The Functional Genomics Platform maps all of the many-to-many connections between each biological entity including the originating genome, gene, protein, and protein domain. Various microbial studies, from infectious disease to environmental health, can benefit from the rich data and connections. We describe the data selection, the pipeline to create and update the Functional Genomics Platform, and the developer tools (Python SDK and REST APIs)which allow researchers to efficiently study microbial life at scale.


Assuntos
Bases de Dados Genéticas , Software , Computação em Nuvem , Genoma , Genômica/métodos
3.
Viruses ; 13(12)2021 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-34960694

RESUMO

SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. However, autonomous genome annotation of SARS-CoV-2 genes, proteins, and domains is not readily accomplished by existing methods and results in missing or incorrect sequences. To overcome this limitation, we developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on the use of a single reference genome and by overcoming atypical genomic traits that challenge traditional bioinformatic methods. We analyzed an initial corpus of 66,000 SARS-CoV-2 genome sequences collected from labs across the world using our method and identified the comprehensive set of known proteins with 98.5% set membership accuracy and 99.1% accuracy in length prediction, compared to proteome references, including Replicase polyprotein 1ab (with its transcriptional slippage site). Compared to other published tools, such as Prokka (base) and VAPiD, we yielded a 6.4- and 1.8-fold increase in protein annotations. Our method generated 13,000,000 gene, protein, and domain sequences-some conserved across time and geography and others representing emerging variants. We observed 3362 non-redundant sequences per protein on average within this corpus and described key D614G and N501Y variants spatiotemporally in the initial genome corpus. For spike glycoprotein domains, we achieved greater than 97.9% sequence identity to references and characterized receptor binding domain variants. We further demonstrated the robustness and extensibility of our method on an additional 4000 variant diverse genomes containing all named variants of concern and interest as of August 2021. In this cohort, we successfully identified all keystone spike glycoprotein mutations in our predicted protein sequences with greater than 99% accuracy as well as demonstrating high accuracy of the protein and domain annotations. This work comprehensively presents the molecular targets to refine biomedical interventions for SARS-CoV-2 with a scalable, high-accuracy method to analyze newly sequenced infections as they arise.


Assuntos
COVID-19/virologia , Genoma Viral , Anotação de Sequência Molecular , SARS-CoV-2/genética , Sequência de Aminoácidos , Sequência de Bases , Biologia Computacional , Humanos , Mutação , Ligação Proteica , Domínios Proteicos , Glicoproteína da Espícula de Coronavírus/genética
5.
Patterns (N Y) ; 2(6): 100269, 2021 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-33969323

RESUMO

Although a plethora of research articles on AI methods on COVID-19 medical imaging are published, their clinical value remains unclear. We conducted the largest systematic review of the literature addressing the utility of AI in imaging for COVID-19 patient care. By keyword searches on PubMed and preprint servers throughout 2020, we identified 463 manuscripts and performed a systematic meta-analysis to assess their technical merit and clinical relevance. Our analysis evidences a significant disparity between clinical and AI communities, in the focus on both imaging modalities (AI experts neglected CT and ultrasound, favoring X-ray) and performed tasks (71.9% of AI papers centered on diagnosis). The vast majority of manuscripts were found to be deficient regarding potential use in clinical practice, but 2.7% (n = 12) publications were assigned a high maturity level and are summarized in greater detail. We provide an itemized discussion of the challenges in developing clinically relevant AI solutions with recommendations and remedies.

6.
Sci Rep ; 11(1): 8988, 2021 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-33903676

RESUMO

Rapid tests for active SARS-CoV-2 infections rely on reverse transcription polymerase chain reaction (RT-PCR). RT-PCR uses reverse transcription of RNA into complementary DNA (cDNA) and amplification of specific DNA (primer and probe) targets using polymerase chain reaction (PCR). The technology makes rapid and specific identification of the virus possible based on sequence homology of nucleic acid sequence and is much faster than tissue culture or animal cell models. However the technique can lose sensitivity over time as the virus evolves and the target sequences diverge from the selective primer sequences. Different primer sequences have been adopted in different geographic regions. As we rely on these existing RT-PCR primers to track and manage the spread of the Coronavirus, it is imperative to understand how SARS-CoV-2 mutations, over time and geographically, diverge from existing primers used today. In this study, we analyze the performance of the SARS-CoV-2 primers in use today by measuring the number of mismatches between primer sequence and genome targets over time and spatially. We find that there is a growing number of mismatches, an increase by 2% per month, as well as a high specificity of virus based on geographic location.


Assuntos
Primers do DNA/genética , Sondas de DNA/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , SARS-CoV-2/genética , Genoma Viral , Mutação
7.
Sci Data ; 8(1): 92, 2021 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-33767191

RESUMO

We developed a rich dataset of Chest X-Ray (CXR) images to assist investigators in artificial intelligence. The data were collected using an eye-tracking system while a radiologist reviewed and reported on 1,083 CXR images. The dataset contains the following aligned data: CXR image, transcribed radiology report text, radiologist's dictation audio and eye gaze coordinates data. We hope this dataset can contribute to various areas of research particularly towards explainable and multimodal deep learning/machine learning methods. Furthermore, investigators in disease classification and localization, automated radiology report generation, and human-machine interaction can benefit from these data. We report deep learning experiments that utilize the attention maps produced by the eye gaze dataset to show the potential utility of this dataset.


Assuntos
Aprendizado Profundo , Tórax/diagnóstico por imagem , Humanos , Radiografia
8.
Sci Rep ; 11(1): 139, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33420322

RESUMO

Liver cancer is one of the leading causes of cancer deaths in Asia and Africa. It is caused by the Hepatocellular carcinoma (HCC) in almost 90% of all cases. HCC is a malignant tumor and the most common histological type of the primary liver cancers. The detection and evaluation of viable tumor regions in HCC present an important clinical significance since it is a key step to assess response of chemoradiotherapy and tumor cell proportion in genetic tests. Recent advances in computer vision, digital pathology and microscopy imaging enable automatic histopathology image analysis for cancer diagnosis. In this paper, we present a multi-resolution deep learning model HistoCAE for viable tumor segmentation in whole-slide liver histopathology images. We propose convolutional autoencoder (CAE) based framework with a customized reconstruction loss function for image reconstruction, followed by a classification module to classify each image patch as tumor versus non-tumor. The resulting patch-based prediction results are spatially combined to generate the final segmentation result for each WSI. Additionally, the spatially organized encoded feature map derived from small image patches is used to compress the gigapixel whole-slide images. Our proposed model presents superior performance to other benchmark models with extensive experiments, suggesting its efficacy for viable tumor area segmentation with liver whole-slide images.


Assuntos
Aprendizado Profundo , Neoplasias Hepáticas/diagnóstico por imagem , Carcinoma Hepatocelular/diagnóstico por imagem , Carcinoma Hepatocelular/patologia , Humanos , Processamento de Imagem Assistida por Computador , Fígado/diagnóstico por imagem , Fígado/patologia , Neoplasias Hepáticas/patologia
9.
AMIA Annu Symp Proc ; 2021: 571-580, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35308964

RESUMO

Patient Electronic Health Records (EHRs) typically contain a substantial amount of data, which can lead to information overload for clinicians, especially in high-throughput fields like radiology. Thus, it would be beneficial to have a mechanism for summarizing the most clinically relevant patient information pertinent to the needs of clinicians. This study presents a novel approach for the curation of clinician EHR data preference information towards the ultimate goal of providing robust EHR summarization. Clinicians first provide a list of data items of interest across multiple EHR categories. Since this data is manually dictated, it has limited coverage and may not cover all the important terms relevant to a concept. To address this problem, we have developed a knowledge-driven semantic concept expansion approach by leveraging rich biomedical knowledge from the UMLS. The approach expands 1094 seed concepts to 22,325 concepts with 92.69% of the expanded concepts identified as relevant by clinicians.


Assuntos
Registros Eletrônicos de Saúde , Semântica , Humanos
10.
AMIA Annu Symp Proc ; 2018: 205-214, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815058

RESUMO

Much of the critical information in a patient's electronic health record (EHR) is hidden in unstructured text. As such, there is an increasing role for automated text extraction and summarization to make this information available in a way that can be quickly and easily understood. While many clinical note text extraction techniques have been examined, most existing techniques are either narrowly targeted or focus primarily on concept-level extraction, potentially missing important contextual information. In contrast, in this work we examine the extraction of several clinical categories at the phrase level, attempting to provide the necessary context while still keeping the extracted elements concise. To do so, we employ a three-stage pipeline which extracts categorized phrases of interest using clinical concepts as anchor points. Results suggest the proposed method achieves performance comparable to that of individual human annotators.


Assuntos
Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Humanos
11.
AMIA Annu Symp Proc ; 2018: 518-526, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815092

RESUMO

EMR systems are intended to improve patient-centered care management and hospital administrative processing. However, the information stored in EMRs can be disorganized, incomplete, or inconsistent, creating problems at the patient and system level. We present a technology that reconciles inconsistencies between clinical diagnoses and administrative records by analyzing free-text notes, problem lists and recorded diagnoses in real time. A fully integrated pipeline has been developed for efficient, knowledge-driven extraction, normalization, and matching of disease terms among structured and unstructured data, with modular precision of 94-98% on over 1000 patients. This cognitive data review tool improves the path from diagnosis to documentation, facilitating accurate and timely clinical and administrative decision-making.


Assuntos
Doença , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Terminologia como Assunto , Algoritmos , Cognição , Diagnóstico , Documentação , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...