Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 21(2): 195-212, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38347141

RESUMO

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Semântica
2.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38347140

RESUMO

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Assuntos
Inteligência Artificial
3.
PLoS Biol ; 21(1): e3001949, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36693044

RESUMO

The state of open science needs to be monitored to track changes over time and identify areas to create interventions to drive improvements. In order to monitor open science practices, they first need to be well defined and operationalized. To reach consensus on what open science practices to monitor at biomedical research institutions, we conducted a modified 3-round Delphi study. Participants were research administrators, researchers, specialists in dedicated open science roles, and librarians. In rounds 1 and 2, participants completed an online survey evaluating a set of potential open science practices, and for round 3, we hosted two half-day virtual meetings to discuss and vote on items that had not reached consensus. Ultimately, participants reached consensus on 19 open science practices. This core set of open science practices will form the foundation for institutional dashboards and may also be of value for the development of policy, education, and interventions.


Assuntos
Pesquisa Biomédica , Humanos , Consenso , Técnica Delphi , Inquéritos e Questionários , Projetos de Pesquisa
4.
Nucleic Acids Res ; 51(19): 10109-10131, 2023 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-37738673

RESUMO

Enhancer reprogramming has been proposed as a key source of transcriptional dysregulation during tumorigenesis, but the molecular mechanisms underlying this process remain unclear. Here, we identify an enhancer cluster required for normal development that is aberrantly activated in breast and lung adenocarcinoma. Deletion of the SRR124-134 cluster disrupts expression of the SOX2 oncogene, dysregulates genome-wide transcription and chromatin accessibility and reduces the ability of cancer cells to form colonies in vitro. Analysis of primary tumors reveals a correlation between chromatin accessibility at this cluster and SOX2 overexpression in breast and lung cancer patients. We demonstrate that FOXA1 is an activator and NFIB is a repressor of SRR124-134 activity and SOX2 transcription in cancer cells, revealing a co-opting of the regulatory mechanisms involved in early development. Notably, we show that the conserved SRR124 and SRR134 regions are essential during mouse development, where homozygous deletion results in the lethal failure of esophageal-tracheal separation. These findings provide insights into how developmental enhancers can be reprogrammed during tumorigenesis and underscore the importance of understanding enhancer dynamics during development and disease.


The manuscript by Abatti et al. shows that epigenetic reactivation of a pair of distal enhancers that drive Sox2 expression during development (to permit separation of the esophagus and trachea) is responsible for the tumor-promoting re-expression of SOX2 in breast and lung tumors. Intriguingly, the same transcription factors that act on the enhancers during development to either activate or repress them (i.e. FOXA1 and NFIB, respectively) are also required for altering chromatin accessibility of the enhancers and SOX2 transcription in breast and lung cancer cells. With their work, the authors unravel the exact mechanism of how developmentally active enhancers become repurposed in a tumor context and show the relevance of this repurposing event for cancer.


Assuntos
Adenocarcinoma de Pulmão , Neoplasias Pulmonares , Fatores de Transcrição SOXB1 , Animais , Humanos , Camundongos , Adenocarcinoma de Pulmão/genética , Carcinogênese/genética , Cromatina/genética , Elementos Facilitadores Genéticos , Epigênese Genética , Homozigoto , Neoplasias Pulmonares/genética , Deleção de Sequência , Fatores de Transcrição SOXB1/genética , Fatores de Transcrição SOXB1/metabolismo
5.
Nature ; 563(7732): 579-583, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30429608

RESUMO

The use of liquid biopsies for cancer detection and management is rapidly gaining prominence1. Current methods for the detection of circulating tumour DNA involve sequencing somatic mutations using cell-free DNA, but the sensitivity of these methods may be low among patients with early-stage cancer given the limited number of recurrent mutations2-5. By contrast, large-scale epigenetic alterations-which are tissue- and cancer-type specific-are not similarly constrained6 and therefore potentially have greater ability to detect and classify cancers in patients with early-stage disease. Here we develop a sensitive, immunoprecipitation-based protocol to analyse the methylome of small quantities of circulating cell-free DNA, and demonstrate the ability to detect large-scale DNA methylation changes that are enriched for tumour-specific patterns. We also demonstrate robust performance in cancer detection and classification across an extensive collection of plasma samples from several tumour types. This work sets the stage to establish biomarkers for the minimally invasive detection, interception and classification of early-stage cancers based on plasma cell-free DNA methylation patterns.


Assuntos
Ácidos Nucleicos Livres/sangue , Ácidos Nucleicos Livres/metabolismo , Metilação de DNA , DNA de Neoplasias/sangue , DNA de Neoplasias/metabolismo , Detecção Precoce de Câncer/métodos , Neoplasias/classificação , Neoplasias/genética , Adenocarcinoma/sangue , Adenocarcinoma/genética , Animais , Biomarcadores Tumorais/genética , Linhagem Celular Tumoral , Neoplasias Colorretais/sangue , Neoplasias Colorretais/genética , Análise Mutacional de DNA , Epigênese Genética , Feminino , Xenoenxertos , Humanos , Biópsia Líquida , Masculino , Camundongos , Camundongos Endogâmicos NOD , Camundongos SCID , Transplante de Neoplasias , Neoplasias/sangue , Especificidade de Órgãos , Neoplasias Pancreáticas/sangue , Neoplasias Pancreáticas/genética
6.
Bioinformatics ; 38(13): 3327-3336, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35575355

RESUMO

MOTIVATION: Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. RESULTS: We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases-potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software's performance on the test suite. AVAILABILITY AND IMPLEMENTATION: Acidbio is available at https://github.com/hoffmangroup/acidbio. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Genômica/métodos
7.
Health Res Policy Syst ; 21(1): 43, 2023 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-37277824

RESUMO

BACKGROUND: In prior research, we identified and prioritized ten measures to assess research performance that comply with the San Francisco Declaration on Research Assessment, a principle adopted worldwide that discourages metrics-based assessment. Given the shift away from assessment based on Journal Impact Factor, we explored potential barriers to implementing and adopting the prioritized measures. METHODS: We identified administrators and researchers across six research institutes, conducted telephone interviews with consenting participants, and used qualitative description and inductive content analysis to derive themes. RESULTS: We interviewed 18 participants: 6 administrators (research institute business managers and directors) and 12 researchers (7 on appointment committees) who varied by career stage (2 early, 5 mid, 5 late). Participants appreciated that the measures were similar to those currently in use, comprehensive, relevant across disciplines, and generated using a rigorous process. They also said the reporting template was easy to understand and use. In contrast, a few administrators thought the measures were not relevant across disciplines. A few participants said it would be time-consuming and difficult to prepare narratives when reporting the measures, and several thought that it would be difficult to objectively evaluate researchers from a different discipline without considerable effort to read their work. Strategies viewed as necessary to overcome barriers and support implementation of the measures included high-level endorsement of the measures, an official launch accompanied by a multi-pronged communication strategy, training for both researchers and evaluators, administrative support or automated reporting for researchers, guidance for evaluators, and sharing of approaches across research institutes. CONCLUSIONS: While participants identified many strengths of the measures, they also identified a few limitations and offered corresponding strategies to address the barriers that we will apply at our organization. Ongoing work is needed to develop a framework to help evaluators translate the measures into an overall assessment. Given little prior research that identified research assessment measures and strategies to support adoption of those measures, this research may be of interest to other organizations that assess the quality and impact of research.

8.
PLoS Comput Biol ; 17(10): e1009423, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34648491

RESUMO

Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.


Assuntos
Algoritmos , Cromatina/genética , Genoma/genética , Genômica/métodos , Anotação de Sequência Molecular/métodos , Sequenciamento de Cromatina por Imunoprecipitação , Código das Histonas , Humanos , Ligação Proteica
10.
Brief Bioinform ; 19(4): 693-699, 2018 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-28088754

RESUMO

Investing in documenting your bioinformatics software well can increase its impact and save your time. To maximize the effectiveness of your documentation, we suggest following a few guidelines we propose here. We recommend providing multiple avenues for users to use your research software, including a navigable HTML interface with a quick start, useful help messages with detailed explanation and thorough examples for each feature of your software. By following these guidelines, you can assure that your hard work maximally benefits yourself and others.


Assuntos
Biologia Computacional/métodos , Documentação/normas , Guias como Assunto , Software/normas , Humanos
11.
Nature ; 512(7515): 449-52, 2014 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-25164756

RESUMO

Genome function is dynamically regulated in part by chromatin, which consists of the histones, non-histone proteins and RNA molecules that package DNA. Studies in Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular mechanisms of genome function in humans, and have revealed conservation of chromatin components and mechanisms. Nevertheless, the three organisms have markedly different genome sizes, chromosome architecture and gene organization. On human and fly chromosomes, for example, pericentric heterochromatin flanks single centromeres, whereas worm chromosomes have dispersed heterochromatin-like regions enriched in the distal chromosomal 'arms', and centromeres distributed along their lengths. To systematically investigate chromatin organization and associated gene regulation across species, we generated and analysed a large collection of genome-wide chromatin data sets from cell lines and developmental stages in worm, fly and human. Here we present over 800 new data sets from our ENCODE and modENCODE consortia, bringing the total to over 1,400. Comparison of combinatorial patterns of histone modifications, nuclear lamina-associated domains, organization of large-scale topological domains, chromatin environment at promoters and enhancers, nucleosome positioning, and DNA replication patterns reveals many conserved features of chromatin organization among the three organisms. We also find notable differences in the composition and locations of repressive chromatin. These data sets and analyses provide a rich resource for comparative and species-specific investigations of chromatin composition, organization and function.


Assuntos
Caenorhabditis elegans/citologia , Caenorhabditis elegans/genética , Cromatina/genética , Cromatina/metabolismo , Drosophila melanogaster/citologia , Drosophila melanogaster/genética , Animais , Linhagem Celular , Centrômero/genética , Centrômero/metabolismo , Cromatina/química , Montagem e Desmontagem da Cromatina/genética , Replicação do DNA/genética , Elementos Facilitadores Genéticos/genética , Epigênese Genética , Heterocromatina/química , Heterocromatina/genética , Heterocromatina/metabolismo , Histonas/química , Histonas/metabolismo , Humanos , Anotação de Sequência Molecular , Lâmina Nuclear/metabolismo , Nucleossomos/química , Nucleossomos/genética , Nucleossomos/metabolismo , Regiões Promotoras Genéticas/genética , Especificidade da Espécie
12.
Nucleic Acids Res ; 46(20): e120, 2018 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-30169659

RESUMO

Short-read sequencing enables assessment of genetic and biochemical traits of individual genomic regions, such as the location of genetic variation, protein binding and chemical modifications. Every region in a genome assembly has a property called 'mappability', which measures the extent to which it can be uniquely mapped by sequence reads. In regions of lower mappability, estimates of genomic and epigenomic characteristics from sequencing assays are less reliable. These regions have increased susceptibility to spurious mapping from reads from other regions of the genome with sequencing errors or unexpected genetic variation. Bisulfite sequencing approaches used to identify DNA methylation exacerbate these problems by introducing large numbers of reads that map to multiple regions. Both to correct assumptions of uniformity in downstream analysis and to identify regions where the analysis is less reliable, it is necessary to know the mappability of both ordinary and bisulfite-converted genomes. We introduce the Umap software for identifying uniquely mappable regions of any genome. Its Bismap extension identifies mappability of the bisulfite-converted genome. A Umap and Bismap track hub for human genome assemblies GRCh37/hg19 and GRCh38/hg38, and mouse assemblies GRCm37/mm9 and GRCm38/mm10 is available at https://bismap.hoffmanlab.org for use with genome browsers.


Assuntos
Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Metilação de DNA , Genoma Humano/genética , Ilhas de CpG/genética , Epigenômica/métodos , Genômica/métodos , Humanos , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos
14.
Bioinformatics ; 34(4): 669-671, 2018 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-29028889

RESUMO

Summary: Segway performs semi-automated genome annotation, discovering joint patterns across multiple genomic signal datasets. We discuss a major new version of Segway and highlight its ability to model data with substantially greater accuracy. Major enhancements in Segway 2.0 include the ability to model data with a mixture of Gaussians, enabling capture of arbitrarily complex signal distributions, and minibatch training, leading to better learned parameters. Availability and implementation: Segway and its source code are freely available for download at http://segway.hoffmanlab.org. We have made available scripts (https://doi.org/10.5281/zenodo.802939) and datasets (https://doi.org/10.5281/zenodo.802906) for this paper's analysis. Contact: michael.hoffman@utoronto.ca. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos , Software , Eucariotos/genética
15.
Inf Fusion ; 50: 71-91, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30467459

RESUMO

New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.

16.
Genome Res ; 25(4): 544-57, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25677182

RESUMO

The genomic neighborhood of a gene influences its activity, a behavior that is attributable in part to domain-scale regulation. Previous genomic studies have identified many types of regulatory domains. However, due to the difficulty of integrating genomics data sets, the relationships among these domain types are poorly understood. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation of heterogeneous collections of genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing SAGA methods cannot integrate inherently pairwise chromatin conformation data. We developed a new computational method, called graph-based regularization (GBR), for expressing a pairwise prior that encourages certain pairs of genomic loci to receive the same label in a genome annotation. We used GBR to exploit chromatin conformation information during genome annotation by encouraging positions that are close in 3D to occupy the same type of domain. Using this approach, we produced a model of chromatin domains in eight human cell types, thereby revealing the relationships among known domain types. Through this model, we identified clusters of tightly regulated genes expressed in only a small number of cell types, which we term "specific expression domains." We found that domain boundaries marked by promoters and CTCF motifs are consistent between cell types even when domain activity changes. Finally, we showed that GBR can be used to transfer information from well-studied cell types to less well-characterized cell types during genome annotation, making it possible to produce high-quality annotations of the hundreds of cell types with limited available data.


Assuntos
Cromatina/genética , Biologia Computacional/métodos , Genômica/métodos , Conformação Molecular , Anotação de Sequência Molecular/métodos , Algoritmos , Motivos de Aminoácidos/genética , Linhagem Celular Tumoral , Cromatina/metabolismo , Estruturas Cromossômicas , Genoma Humano/genética , Células HeLa , Células Hep G2 , Células Endoteliais da Veia Umbilical Humana , Humanos , Regiões Promotoras Genéticas/genética
18.
Genome Res ; 22(9): 1813-31, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955991

RESUMO

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.


Assuntos
Imunoprecipitação da Cromatina/métodos , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Genoma/genética , Genômica/métodos , Guias como Assunto , Histonas/metabolismo , Humanos , Internet , Fatores de Transcrição/metabolismo
19.
Nat Methods ; 9(5): 473-6, 2012 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-22426492

RESUMO

We trained Segway, a dynamic Bayesian network method, simultaneously on chromatin data from multiple experiments, including positions of histone modifications, transcription-factor binding and open chromatin, all derived from a human chronic myeloid leukemia cell line. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, transcriptional regulator CTCF-binding regions and repressed regions. Software and genome browser tracks are at http://noble.gs.washington.edu/proj/segway/.


Assuntos
Cromatina/fisiologia , Genoma Humano , Histonas/fisiologia , Sítio de Iniciação de Transcrição , Teorema de Bayes , Cromatina/genética , Histonas/genética , Humanos , Células K562 , Dados de Sequência Molecular , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/genética , Fatores de Transcrição/fisiologia
20.
Nucleic Acids Res ; 41(2): 827-41, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23221638

RESUMO

The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.


Assuntos
Cromatina/química , Genoma Humano , Anotação de Sequência Molecular , Elementos Reguladores de Transcrição , Elementos Facilitadores Genéticos , Estudo de Associação Genômica Ampla , Humanos , Elementos Isolantes , Regiões Promotoras Genéticas , Proteínas/genética , Regiões Terminadoras Genéticas , Transcrição Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA