Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 604(7906): 437-446, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35444317

RESUMO

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.


Assuntos
Genoma Humano , Genômica , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
2.
Nucleic Acids Res ; 51(D1): D1300-D1311, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36350676

RESUMO

Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.


Assuntos
Genoma Humano , Software , Humanos , Anotação de Sequência Molecular , Genômica , Genótipo , Variação Genética
3.
Cell Syst ; 12(11): 1108-1120.e4, 2021 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-34464590

RESUMO

Genotype imputation is a fundamental step in genomic data analysis, where missing variant genotypes are predicted using the existing genotypes of nearby "tag" variants. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. Here, we developed secure genotype imputation using efficient homomorphic encryption (HE) techniques. In HE-based methods, the genotype data are secure while it is in transit, at rest, and in analysis. It can only be decrypted by the owner. We compared secure imputation with three state-of-the-art non-secure methods and found that HE-based methods provide genetic data security with comparable accuracy for common variants. HE-based methods have time and memory requirements that are comparable or lower than those for the non-secure methods. Our results provide evidence that HE-based methods can practically perform resource-intensive computations for high-throughput genetic data analysis. The source code is freely available for download at https://github.com/K-miran/secure-imputation.


Assuntos
Serviços Terceirizados , Segurança Computacional , Estudo de Associação Genômica Ampla , Genótipo , Privacidade
4.
Cell Genom ; 1(2): None, 2021 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-34820659

RESUMO

Human biomedical datasets that are critical for research and clinical studies to benefit human health also often contain sensitive or potentially identifying information of individual participants. Thus, care must be taken when they are processed and made available to comply with ethical and regulatory frameworks and informed consent data conditions. To enable and streamline data access for these biomedical datasets, the Global Alliance for Genomics and Health (GA4GH) Data Use and Researcher Identities (DURI) work stream developed and approved the Data Use Ontology (DUO) standard. DUO is a hierarchical vocabulary of human and machine-readable data use terms that consistently and unambiguously represents a dataset's allowable data uses. DUO has been implemented by major international stakeholders such as the Broad and Sanger Institutes and is currently used in annotation of over 200,000 datasets worldwide. Using DUO in data management and access facilitates researchers' discovery and access of relevant datasets. DUO annotations increase the FAIRness of datasets and support data linkages using common data use profiles when integrating the data for secondary analyses. DUO is implemented in the Web Ontology Language (OWL) and, to increase community awareness and engagement, hosted in an open, centralized GitHub repository. DUO, together with the GA4GH Passport standard, offers a new, efficient, and streamlined data authorization and access framework that has enabled increased sharing of biomedical datasets worldwide.

5.
Mol Microbiol ; 74(3): 557-81, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19737356

RESUMO

The ability of a bacterial cell to monitor and adaptively respond to its environment is crucial for survival. After one- and two-component systems, extracytoplasmic function (ECF) sigma factors - the largest group of alternative sigma factors - represent the third fundamental mechanism of bacterial signal transduction, with about six such regulators on average per bacterial genome. Together with their cognate anti-sigma factors, they represent a highly modular design that primarily facilitates transmembrane signal transduction. A comprehensive analysis of the ECF sigma factor protein family identified more than 40 distinct major groups of ECF sigma factors. The functional relevance of this classification is supported by the sequence similarity and domain architecture of cognate anti-sigma factors, genomic context conservation, and potential target promoter motifs. Moreover, this phylogenetic analysis revealed unique features indicating novel mechanisms of ECF-mediated signal transduction. This classification, together with the web tool ECFfinder and the information stored in the Microbial Signal Transduction (MiST) database, provides a comprehensive resource for the analysis of ECF sigma factor-dependent gene regulation.


Assuntos
Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Fator sigma/metabolismo , Transdução de Sinais , Motivos de Aminoácidos/genética , Sequência de Aminoácidos , Bactérias/genética , Proteínas de Bactérias/classificação , Proteínas de Bactérias/genética , Perfilação da Expressão Gênica , Regulação Bacteriana da Expressão Gênica , Genes Bacterianos , Genoma Bacteriano , Genômica , Mycobacterium tuberculosis/genética , Mycobacterium tuberculosis/metabolismo , Proteínas Quinases/genética , Proteínas Quinases/metabolismo , Estrutura Terciária de Proteína/genética , RNA Bacteriano/análise , RNA Bacteriano/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Alinhamento de Sequência , Fator sigma/classificação , Fator sigma/genética , Transdução de Sinais/genética , Fatores de Virulência/genética
6.
J Am Med Inform Assoc ; 27(11): 1721-1726, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32918447

RESUMO

Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, scientific community still lacks a secure and privacy-preserving infrastructure to support auditable data sharing and facilitate automated and legally compliant federated analysis on an international scale. Existing health informatics systems do not incorporate the latest progress in modern security and federated machine learning algorithms, which are poised to offer solutions. An international group of passionate researchers came together with a joint mission to solve the problem with our finest models and tools. The SCOR Consortium has developed a ready-to-deploy secure infrastructure using world-class privacy and security technologies to reconcile the privacy/utility conflicts. We hope our effort will make a change and accelerate research in future pandemics with broad and diverse samples on an international scale.


Assuntos
Pesquisa Biomédica , Segurança Computacional , Infecções por Coronavirus , Disseminação de Informação , Pandemias , Pneumonia Viral , Privacidade , COVID-19 , Humanos , Disseminação de Informação/ética , Internacionalidade , Aprendizado de Máquina
7.
BMC Bioinformatics ; 9 Suppl 6: S6, 2008 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-18541059

RESUMO

BACKGROUND: Graphs and networks are common analysis representations for biological systems. Many traditional graph algorithms such as k-clique, k-coloring, and subgraph matching have great potential as analysis techniques for newly available data in biology. Yet, as the amount of genomic and bionetwork information rapidly grows, scientists need advanced new computational strategies and tools for dealing with the complexities of the bionetwork analysis and the volume of the data. RESULTS: We introduce a computational framework for graph analysis called the Biological Graph Environment (BioGraphE), which provides a general, scalable integration platform for connecting graph problems in biology to optimized computational solvers and high-performance systems. This framework enables biology researchers and computational scientists to identify and deploy network analysis applications and to easily connect them to efficient and powerful computational software and hardware that are specifically designed and tuned to solve complex graph problems. In our particular application of BioGraphE to support network analysis in genome biology, we investigate the use of a Boolean satisfiability solver known as Survey Propagation as a core computational solver executing on standard high-performance parallel systems, as well as multi-threaded architectures. CONCLUSION: In our application of BioGraphE to conduct bionetwork analysis of homology networks, we found that BioGraphE and a custom, parallel implementation of the Survey Propagation SAT solver were capable of solving very large bionetwork problems at high rates of execution on different high-performance computing platforms.


Assuntos
Algoritmos , Gráficos por Computador , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Software , Simulação por Computador
8.
Sci Data ; 5: 180039, 2018 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-29537396

RESUMO

The volume of genomics and health data is growing rapidly, driven by sequencing for both research and clinical use. However, under current practices, the data is fragmented into many distinct datasets, and researchers must go through a separate application process for each dataset. This is time-consuming both for the researchers and the data stewards, and it reduces the velocity of research and new discoveries that could improve human health. We propose to simplify this process, by introducing a standard Library Card that identifies and authenticates researchers across all participating datasets. Each researcher would only need to apply once to establish their bona fides as a qualified researcher, and could then use the Library Card to access a wide range of datasets that use a compatible data access policy and authentication protocol.

9.
Cell Syst ; 6(3): 271-281.e7, 2018 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-29596782

RESUMO

The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects.


Assuntos
Genômica/métodos , Neoplasias/genética , Análise de Sequência de DNA/métodos , Algoritmos , Exoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Disseminação de Informação/métodos , Mutação , Software , Sequenciamento do Exoma/métodos
10.
NPJ Genom Med ; 2: 33, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29263842

RESUMO

The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In 2016, we organized the third Critical Assessment of Data Privacy and Protection competition as a community effort to bring together biomedical informaticists, computer privacy and security researchers, and scholars in ethical, legal, and social implications (ELSI) to assess the latest advances on privacy-preserving techniques for protecting human genomic data. Teams were asked to develop novel protection methods for emerging genome privacy challenges in three scenarios: Track (1) data sharing through the Beacon service of the Global Alliance for Genomics and Health. Track (2) collaborative discovery of similar genomes between two institutions; and Track (3) data outsourcing to public cloud services. The latter two tracks represent continuing themes from our 2015 competition, while the former was new and a response to a recently established vulnerability. The winning strategy for Track 1 mitigated the privacy risk by hiding approximately 11% of the variation in the database while permitting around 160,000 queries, a significant improvement over the baseline. The winning strategies in Tracks 2 and 3 showed significant progress over the previous competition by achieving multiple orders of magnitude performance improvement in terms of computational runtime and memory requirements. The outcomes suggest that applying highly optimized privacy-preserving and secure computation techniques to safeguard genomic data sharing and analysis is useful. However, the results also indicate that further efforts are needed to refine these techniques into practical solutions.

11.
J Am Med Inform Assoc ; 24(4): 799-805, 2017 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-28339683

RESUMO

The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context-a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or "beacon") is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards.While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable. However, recent work demonstrated that, given a beacon with specific characteristics (including relatively small sample size and an adversary who possesses an individual's whole genome sequence), the individual's membership in a beacon can be inferred through repeated queries for variants present in the individual's genome.In this paper, we propose three practical strategies for reducing re-identification risks in beacons. The first two strategies manipulate the beacon such that the presence of rare alleles is obscured; the third strategy budgets the number of accesses per user for each individual genome. Using a beacon containing data from the 1000 Genomes Project, we demonstrate that the proposed strategies can effectively reduce re-identification risk in beacon-like datasets.


Assuntos
Anonimização de Dados , Privacidade Genética , Disseminação de Informação , Genômica , Humanos
12.
FEMS Microbiol Rev ; 27(5): 559-92, 2003 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-14638413

RESUMO

The Crp-Fnr regulators, named after the first two identified members, are DNA-binding proteins which predominantly function as positive transcription factors, though roles of repressors are also important. Among over 1200 proteins with an N-terminally located nucleotide-binding domain similar to the cyclic adenosine monophosphate (cAMP) receptor protein, the distinctive additional trait of the Crp-Fnr superfamily is a C-terminally located helix-turn-helix motif for DNA binding. From a curated database of 369 family members exhibiting both features, we provide a protein tree of Crp-Fnr proteins according to their phylogenetic relationships. This results in the assembly of the regulators ArcR, CooA, CprK, Crp, Dnr, FixK, Flp, Fnr, FnrN, MalR, NnrR, NtcA, PrfA, and YeiL and their homologs in distinct clusters. Lead members and representatives of these groups are described, placing emphasis on the less well-known regulators and target processes. Several more groups consist of sequence-derived proteins of unknown physiological roles; some of them are tight clusters of highly similar members. The Crp-Fnr regulators stand out in responding to a broad spectrum of intracellular and exogenous signals such as cAMP, anoxia, the redox state, oxidative and nitrosative stress, nitric oxide, carbon monoxide, 2-oxoglutarate, or temperature. To accomplish their roles, Crp-Fnr members have intrinsic sensory modules allowing the binding of allosteric effector molecules, or have prosthetic groups for the interaction with the signal. The regulatory adaptability and structural flexibility represented in the Crp-Fnr scaffold has led to the evolution of an important group of physiologically versatile transcription factors.


Assuntos
Proteína Receptora de AMP Cíclico/genética , Proteína Receptora de AMP Cíclico/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Proteínas Ferro-Enxofre/genética , Proteínas Ferro-Enxofre/metabolismo , Filogenia , Regulação Bacteriana da Expressão Gênica , Bactérias Gram-Negativas/genética , Bactérias Gram-Negativas/metabolismo , Bactérias Gram-Positivas/genética , Bactérias Gram-Positivas/metabolismo , Ativação Transcricional
13.
BMC Med Genomics ; 9(1): 63, 2016 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-27733153

RESUMO

The outsourcing of genomic data into public cloud computing settings raises concerns over privacy and security. Significant advancements in secure computation methods have emerged over the past several years, but such techniques need to be rigorously evaluated for their ability to support the analysis of human genomic data in an efficient and cost-effective manner. With respect to public cloud environments, there are concerns about the inadvertent exposure of human genomic data to unauthorized users. In analyses involving multiple institutions, there is additional concern about data being used beyond agreed research scope and being prcoessed in untrused computational environments, which may not satisfy institutional policies. To systematically investigate these issues, the NIH-funded National Center for Biomedical Computing iDASH (integrating Data for Analysis, 'anonymization' and SHaring) hosted the second Critical Assessment of Data Privacy and Protection competition to assess the capacity of cryptographic technologies for protecting computation over human genomes in the cloud and promoting cross-institutional collaboration. Data scientists were challenged to design and engineer practical algorithms for secure outsourcing of genome computation tasks in working software, whereby analyses are performed only on encrypted data. They were also challenged to develop approaches to enable secure collaboration on data from genomic studies generated by multiple organizations (e.g., medical centers) to jointly compute aggregate statistics without sharing individual-level records. The results of the competition indicated that secure computation techniques can enable comparative analysis of human genomes, but greater efficiency (in terms of compute time and memory utilization) are needed before they are sufficiently practical for real world environments.


Assuntos
Computação em Nuvem , Segurança Computacional , Genômica , Estudo de Associação Genômica Ampla
15.
Biomol NMR Assign ; 2(1): 25-8, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19636916

RESUMO

Cyanothece 51142 contains a 78-residue protein, cce_0567, that falls into the DUF683 family of proteins associated with nitrogen fixation. Here we report the assignment of most of the main chain and 13C(beta) side chain resonances of the approximately 40 kDa homo-tetramer.


Assuntos
Proteínas de Bactérias/química , Cyanothece/metabolismo , Espectroscopia de Ressonância Magnética/métodos , Fixação de Nitrogênio , Sequência de Aminoácidos , Isótopos de Carbono/química , Dados de Sequência Molecular , Peso Molecular , Isótopos de Nitrogênio/química , Prótons
16.
Mol Cell ; 27(5): 793-805, 2007 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-17803943

RESUMO

A transcriptional response to singlet oxygen in Rhodobacter sphaeroides is controlled by the group IV sigma factor sigma(E) and its cognate anti-sigma ChrR. Crystal structures of the sigma(E)/ChrR complex reveal a modular, two-domain architecture for ChrR. The ChrR N-terminal anti-sigma domain (ASD) binds a Zn(2+) ion, contacts sigma(E), and is sufficient to inhibit sigma(E)-dependent transcription. The ChrR C-terminal domain adopts a cupin fold, can coordinate an additional Zn(2+), and is required for the transcriptional response to singlet oxygen. Structure-based sequence analyses predict that the ASD defines a common structural fold among predicted group IV anti-sigmas. These ASDs are fused to diverse C-terminal domains that are likely involved in responding to specific environmental signals that control the activity of their cognate sigma factor.


Assuntos
Proteínas de Bactérias/química , Rhodobacter sphaeroides/genética , Fator sigma/química , Fatores de Transcrição/química , Transcrição Gênica/fisiologia , Sequência de Aminoácidos , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/fisiologia , Sítios de Ligação , Cristalografia por Raios X , Regulação Bacteriana da Expressão Gênica , Modelos Moleculares , Dados de Sequência Molecular , Oxigênio/metabolismo , Dobramento de Proteína , Estrutura Terciária de Proteína , Rhodobacter sphaeroides/metabolismo , Alinhamento de Sequência , Fator sigma/fisiologia , Fatores de Transcrição/fisiologia , Zinco/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA