Search | VHL Regional Portal

1.

Annotation of nuclear lncRNAs based on chromatin interactions.

Agrawal, Saumya; Buyan, Andrey; Severin, Jessica; Koido, Masaru; Alam, Tanvir; Abugessaisa, Imad; Chang, Howard Y; Dostie, Josée; Itoh, Masayoshi; Kere, Juha; Kondo, Naoto; Li, Yunjing; Makeev, Vsevolod J; Mendez, Mickaël; Okazaki, Yasushi; Ramilowski, Jordan A; Sigorskikh, Andrey I; Strug, Lisa J; Yagi, Ken; Yasuzawa, Kayoko; Yip, Chi Wai; Hon, Chung Chau; Hoffman, Michael M; Terao, Chikashi; Kulakovskiy, Ivan V; Kasukawa, Takeya; Shin, Jay W; Carninci, Piero; de Hoon, Michiel J L.

PLoS One ; 19(5): e0295971, 2024.

Article in English | MEDLINE | ID: mdl-38709794

ABSTRACT

The human genome is pervasively transcribed and produces a wide variety of long non-coding RNAs (lncRNAs), constituting the majority of transcripts across human cell types. Some specific nuclear lncRNAs have been shown to be important regulatory components acting locally. As RNA-chromatin interaction and Hi-C chromatin conformation data showed that chromatin interactions of nuclear lncRNAs are determined by the local chromatin 3D conformation, we used Hi-C data to identify potential target genes of lncRNAs. RNA-protein interaction data suggested that nuclear lncRNAs act as scaffolds to recruit regulatory proteins to target promoters and enhancers. Nuclear lncRNAs may therefore play a role in directing regulatory factors to locations spatially close to the lncRNA gene. We provide the analysis results through an interactive visualization web portal at https://fantom.gsc.riken.jp/zenbu/reports/#F6_3D_lncRNA.

Subject(s)

Chromatin , RNA, Long Noncoding , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Chromatin/metabolism , Chromatin/genetics , Humans , Molecular Sequence Annotation , Cell Nucleus/metabolism , Cell Nucleus/genetics , Genome, Human , Promoter Regions, Genetic

2.

TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods.

Collins, Gary S; Moons, Karel G M; Dhiman, Paula; Riley, Richard D; Beam, Andrew L; Van Calster, Ben; Ghassemi, Marzyeh; Liu, Xiaoxuan; Reitsma, Johannes B; van Smeden, Maarten; Boulesteix, Anne-Laure; Camaradou, Jennifer Catherine; Celi, Leo Anthony; Denaxas, Spiros; Denniston, Alastair K; Glocker, Ben; Golub, Robert M; Harvey, Hugh; Heinze, Georg; Hoffman, Michael M; Kengne, André Pascal; Lam, Emily; Lee, Naomi; Loder, Elizabeth W; Maier-Hein, Lena; Mateen, Bilal A; McCradden, Melissa D; Oakden-Rayner, Lauren; Ordish, Johan; Parnell, Richard; Rose, Sherri; Singh, Karandeep; Wynants, Laure; Logullo, Patricia.

BMJ ; 385: e078378, 2024 04 16.

Article in English | MEDLINE | ID: mdl-38626948

Subject(s)

Decision Support Techniques , Models, Statistical , Humans , Prognosis , Checklist

3.

Understanding metric-related pitfalls in image analysis validation.

Reinke, Annika; Tizabi, Minu D; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Kavur, A Emre; Rädsch, Tim; Sudre, Carole H; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Buettner, Florian; Cardoso, M Jorge; Cheplygina, Veronika; Chen, Jianxu; Christodoulou, Evangelia; Cimini, Beth A; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Glocker, Ben; Godau, Patrick; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Isensee, Fabian; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Kleesiek, Jens; Kofler, Florian; Kooi, Thijs; Kopp-Schneider, Annette; Kozubek, Michal; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin; Maier-Hein, Klaus; Martel, Anne L; Meijering, Erik; Menze, Bjoern; Moons, Karel G M; Müller, Henning.

Nat Methods ; 21(2): 182-194, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38347140

ABSTRACT

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.

Subject(s)

Artificial Intelligence

4.

Metrics reloaded: recommendations for image analysis validation.

Maier-Hein, Lena; Reinke, Annika; Godau, Patrick; Tizabi, Minu D; Buettner, Florian; Christodoulou, Evangelia; Glocker, Ben; Isensee, Fabian; Kleesiek, Jens; Kozubek, Michal; Reyes, Mauricio; Riegler, Michael A; Wiesenfarth, Manuel; Kavur, A Emre; Sudre, Carole H; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Rädsch, Tim; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Blaschko, Matthew B; Cardoso, M Jorge; Cheplygina, Veronika; Cimini, Beth A; Collins, Gary S; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Haase, Robert; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Karthikesalingam, Alan; Kofler, Florian; Kopp-Schneider, Annette; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin.

Nat Methods ; 21(2): 195-212, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38347141

ABSTRACT

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

Subject(s)

Algorithms , Image Processing, Computer-Assisted , Machine Learning , Semantics

5.

Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet.

Viner, Coby; Ishak, Charles A; Johnson, James; Walker, Nicolas J; Shi, Hui; Sjöberg-Herrera, Marcela K; Shen, Shu Yi; Lardo, Santana M; Adams, David J; Ferguson-Smith, Anne C; De Carvalho, Daniel D; Hainer, Sarah J; Bailey, Timothy L; Hoffman, Michael M.

Genome Biol ; 25(1): 11, 2024 01 08.

Article in English | MEDLINE | ID: mdl-38191487

ABSTRACT

BACKGROUND: Transcription factors bind DNA in specific sequence contexts. In addition to distinguishing one nucleobase from another, some transcription factors can distinguish between unmodified and modified bases. Current models of transcription factor binding tend not to take DNA modifications into account, while the recent few that do often have limitations. This makes a comprehensive and accurate profiling of transcription factor affinities difficult. RESULTS: Here, we develop methods to identify transcription factor binding sites in modified DNA. Our models expand the standard A/C/G/T DNA alphabet to include cytosine modifications. We develop Cytomod to create modified genomic sequences and we also enhance the MEME Suite, adding the capacity to handle custom alphabets. We adapt the well-established position weight matrix (PWM) model of transcription factor binding affinity to this expanded DNA alphabet. Using these methods, we identify modification-sensitive transcription factor binding motifs. We confirm established binding preferences, such as the preference of ZFP57 and C/EBPß for methylated motifs and the preference of c-Myc for unmethylated E-box motifs. CONCLUSIONS: Using known binding preferences to tune model parameters, we discover novel modified motifs for a wide array of transcription factors. Finally, we validate our binding preference predictions for OCT4 using cleavage under targets and release using nuclease (CUT&RUN) experiments across conventional, methylation-, and hydroxymethylation-enriched sequences. Our approach readily extends to other DNA modifications. As more genome-wide single-base resolution modification data becomes available, we expect that our method will yield insights into altered transcription factor binding affinities across many different modifications.

Subject(s)

Gene Expression Regulation , Transcription Factors , Epigenomics , DNA , Epigenesis, Genetic

6.

Understanding metric-related pitfalls in image analysis validation.

Reinke, Annika; Tizabi, Minu D; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Kavur, A Emre; Rädsch, Tim; Sudre, Carole H; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Blaschko, Matthew; Buettner, Florian; Cardoso, M Jorge; Cheplygina, Veronika; Chen, Jianxu; Christodoulou, Evangelia; Cimini, Beth A; Collins, Gary S; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Glocker, Ben; Godau, Patrick; Haase, Robert; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Isensee, Fabian; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Karthikesalingam, Alan; Kenngott, Hannes; Kleesiek, Jens; Kofler, Florian; Kooi, Thijs; Kopp-Schneider, Annette; Kozubek, Michal; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin; Maier-Hein, Klaus.

ArXiv ; 2024 Feb 23.

Article in English | MEDLINE | ID: mdl-36945687

ABSTRACT

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

7.

Epigenetic reprogramming of a distal developmental enhancer cluster drives SOX2 overexpression in breast and lung adenocarcinoma.

Abatti, Luis E; Lado-Fernández, Patricia; Huynh, Linh; Collado, Manuel; Hoffman, Michael M; Mitchell, Jennifer A.

Nucleic Acids Res ; 51(19): 10109-10131, 2023 10 27.

Article in English | MEDLINE | ID: mdl-37738673

ABSTRACT

Enhancer reprogramming has been proposed as a key source of transcriptional dysregulation during tumorigenesis, but the molecular mechanisms underlying this process remain unclear. Here, we identify an enhancer cluster required for normal development that is aberrantly activated in breast and lung adenocarcinoma. Deletion of the SRR124-134 cluster disrupts expression of the SOX2 oncogene, dysregulates genome-wide transcription and chromatin accessibility and reduces the ability of cancer cells to form colonies in vitro. Analysis of primary tumors reveals a correlation between chromatin accessibility at this cluster and SOX2 overexpression in breast and lung cancer patients. We demonstrate that FOXA1 is an activator and NFIB is a repressor of SRR124-134 activity and SOX2 transcription in cancer cells, revealing a co-opting of the regulatory mechanisms involved in early development. Notably, we show that the conserved SRR124 and SRR134 regions are essential during mouse development, where homozygous deletion results in the lethal failure of esophageal-tracheal separation. These findings provide insights into how developmental enhancers can be reprogrammed during tumorigenesis and underscore the importance of understanding enhancer dynamics during development and disease.

The manuscript by Abatti et al. shows that epigenetic reactivation of a pair of distal enhancers that drive Sox2 expression during development (to permit separation of the esophagus and trachea) is responsible for the tumor-promoting re-expression of SOX2 in breast and lung tumors. Intriguingly, the same transcription factors that act on the enhancers during development to either activate or repress them (i.e. FOXA1 and NFIB, respectively) are also required for altering chromatin accessibility of the enhancers and SOX2 transcription in breast and lung cancer cells. With their work, the authors unravel the exact mechanism of how developmentally active enhancers become repurposed in a tumor context and show the relevance of this repurposing event for cancer.

Subject(s)

Adenocarcinoma of Lung , Lung Neoplasms , SOXB1 Transcription Factors , Animals , Humans , Mice , Adenocarcinoma of Lung/genetics , Carcinogenesis/genetics , Chromatin/genetics , Enhancer Elements, Genetic , Epigenesis, Genetic , Homozygote , Lung Neoplasms/genetics , Sequence Deletion , SOXB1 Transcription Factors/genetics , SOXB1 Transcription Factors/metabolism

8.

Exploring the merits of research performance measures that comply with the San Francisco Declaration on Research Assessment and strategies to overcome barriers of adoption: qualitative interviews with administrators and researchers.

Boury, Himani; Albert, Mathieu; Chen, Robert H C; Chow, James C L; DaCosta, Ralph; Hoffman, Michael M; Keshavarz, Behrang; Kontos, Pia; McAndrews, Mary Pat; Protze, Stephanie; Gagliardi, Anna R.

Health Res Policy Syst ; 21(1): 43, 2023 Jun 05.

Article in English | MEDLINE | ID: mdl-37277824

ABSTRACT

BACKGROUND: In prior research, we identified and prioritized ten measures to assess research performance that comply with the San Francisco Declaration on Research Assessment, a principle adopted worldwide that discourages metrics-based assessment. Given the shift away from assessment based on Journal Impact Factor, we explored potential barriers to implementing and adopting the prioritized measures. METHODS: We identified administrators and researchers across six research institutes, conducted telephone interviews with consenting participants, and used qualitative description and inductive content analysis to derive themes. RESULTS: We interviewed 18 participants: 6 administrators (research institute business managers and directors) and 12 researchers (7 on appointment committees) who varied by career stage (2 early, 5 mid, 5 late). Participants appreciated that the measures were similar to those currently in use, comprehensive, relevant across disciplines, and generated using a rigorous process. They also said the reporting template was easy to understand and use. In contrast, a few administrators thought the measures were not relevant across disciplines. A few participants said it would be time-consuming and difficult to prepare narratives when reporting the measures, and several thought that it would be difficult to objectively evaluate researchers from a different discipline without considerable effort to read their work. Strategies viewed as necessary to overcome barriers and support implementation of the measures included high-level endorsement of the measures, an official launch accompanied by a multi-pronged communication strategy, training for both researchers and evaluators, administrative support or automated reporting for researchers, guidance for evaluators, and sharing of approaches across research institutes. CONCLUSIONS: While participants identified many strengths of the measures, they also identified a few limitations and offered corresponding strategies to address the barriers that we will apply at our organization. Ongoing work is needed to develop a framework to help evaluators translate the measures into an overall assessment. Given little prior research that identified research assessment measures and strategies to support adoption of those measures, this research may be of interest to other organizations that assess the quality and impact of research.

9.

Human papillomavirus integration transforms chromatin to drive oncogenesis.

Karimzadeh, Mehran; Arlidge, Christopher; Rostami, Ariana; Lupien, Mathieu; Bratman, Scott V; Hoffman, Michael M.

Genome Biol ; 24(1): 142, 2023 06 27.

Article in English | MEDLINE | ID: mdl-37365652

ABSTRACT

BACKGROUND: Human papillomavirus (HPV) drives almost all cervical cancers and up to 70% of head and neck cancers. Frequent integration into the host genome occurs predominantly in tumorigenic types of HPV. We hypothesize that changes in chromatin state at the location of integration can result in changes in gene expression that contribute to the tumorigenicity of HPV. RESULTS: We find that viral integration events often occur along with changes in chromatin state and expression of genes near the integration site. We investigate whether introduction of new transcription factor binding sites due to HPV integration could invoke these changes. Some regions within the HPV genome, particularly the position of a conserved CTCF binding site, show enriched chromatin accessibility signal. ChIP-seq reveals that the conserved CTCF binding site within the HPV genome binds CTCF in 4 HPVï»¿+ cancer cell lines. Significant changes in CTCF binding pattern and increases in chromatin accessibility occur exclusively within 100 kbp of HPV integration sites. The chromatin changes co-occur with out-sized changes in transcription and alternative splicing of local genes. Analysis of The Cancer Genome Atlas (TCGA) HPV+ tumors indicates that HPV integration upregulates genes which have significantly higher essentiality scores compared to randomly selected upregulated genes from the same tumors. CONCLUSIONS: Our results suggest that introduction of a new CTCF binding site due to HPV integration reorganizes chromatin state and upregulates genes essential for tumor viability in some HPV+ tumors. These findings emphasize a newly recognized role of HPV integration in oncogenesis.

Subject(s)

Head and Neck Neoplasms , Papillomavirus Infections , Humans , Chromatin , Human Papillomavirus Viruses , Carcinogenesis

10.

Motif elucidation in ChIP-seq datasets with a knockout control.

Denisko, Danielle; Viner, Coby; Hoffman, Michael M.

Bioinform Adv ; 3(1): vbad031, 2023.

Article in English | MEDLINE | ID: mdl-37033469

ABSTRACT

Summary: Chromatin immunoprecipitation-sequencing is widely used to find transcription factor binding sites, but suffers from various sources of noise. Knocking out the target factor mitigates noise by acting as a negative control. Paired wild-type and knockout (KO) experiments can generate improved motifs but require optimal differential analysis. We introduce peaKO-a computational method to automatically optimize motif analyses with KO controls, which we compare to two other methods. PeaKO often improves elucidation of the target factor and highlights the benefits of KO controls, which far outperform input controls. Availability and implementation: PeaKO is freely available at https://peako.hoffmanlab.org. Contact: michael.hoffman@utoronto.ca.

11.

Community consensus on core open science practices to monitor in biomedicine.

Cobey, Kelly D; Haustein, Stefanie; Brehaut, Jamie; Dirnagl, Ulrich; Franzen, Delwen L; Hemkens, Lars G; Presseau, Justin; Riedel, Nico; Strech, Daniel; Alperin, Juan Pablo; Costas, Rodrigo; Sena, Emily S; van Leeuwen, Thed; Ardern, Clare L; Bacellar, Isabel O L; Camack, Nancy; Britto Correa, Marcos; Buccione, Roberto; Cenci, Maximiliano Sergio; Fergusson, Dean A; Gould van Praag, Cassandra; Hoffman, Michael M; Moraes Bielemann, Renata; Moschini, Ugo; Paschetta, Mauro; Pasquale, Valentina; Rac, Valeria E; Roskams-Edris, Dylan; Schatzl, Hermann M; Stratton, Jo Anne; Moher, David.

PLoS Biol ; 21(1): e3001949, 2023 01.

Article in English | MEDLINE | ID: mdl-36693044

ABSTRACT

The state of open science needs to be monitored to track changes over time and identify areas to create interventions to drive improvements. In order to monitor open science practices, they first need to be well defined and operationalized. To reach consensus on what open science practices to monitor at biomedical research institutions, we conducted a modified 3-round Delphi study. Participants were research administrators, researchers, specialists in dedicated open science roles, and librarians. In rounds 1 and 2, participants completed an online survey evaluating a set of potential open science practices, and for round 3, we hosted two half-day virtual meetings to discuss and vote on items that had not reached consensus. Ultimately, participants reached consensus on 19 open science practices. This core set of open science practices will form the foundation for institutional dashboards and may also be of value for the development of policy, education, and interventions.

Subject(s)

Biomedical Research , Humans , Consensus , Delphi Technique , Surveys and Questionnaires , Research Design

12.

Sensitive and reproducible cell-free methylome quantification with synthetic spike-in controls.

Wilson, Samantha L; Shen, Shu Yi; Harmon, Lauren; Burgener, Justin M; Triche, Tim; Bratman, Scott V; De Carvalho, Daniel D; Hoffman, Michael M.

Cell Rep Methods ; 2(9): 100294, 2022 09 19.

Article in English | MEDLINE | ID: mdl-36160046

ABSTRACT

Cell-free methylated DNA immunoprecipitation sequencing (cfMeDIP-seq) identifies genomic regions with DNA methylation, using a protocol adapted to work with low-input DNA samples and with cell-free DNA (cfDNA). We developed a set of synthetic spike-in DNA controls for cfMeDIP-seq to provide a simple and inexpensive reference for quantitative normalization. We designed 54 DNA fragments with combinations of methylation status (methylated and unmethylated), fragment length (80 bp, 160 bp, 320 bp), G + C content (35%, 50%, 65%), and fraction of CpG dinucleotides within the fragment (1/80 bp, 1/40 bp, 1/20 bp). Using 0.01 ng of spike-in controls enables training a generalized linear model that absolutely quantifies methylated cfDNA in MeDIP-seq experiments. It mitigates batch effects and corrects for biases in enrichment due to known biophysical properties of DNA fragments and other technical biases.

Subject(s)

Cell-Free Nucleic Acids , Epigenome , Genomics/methods , DNA Methylation , DNA/genetics , Cell-Free Nucleic Acids/genetics

13.

Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome.

Karimzadeh, Mehran; Hoffman, Michael M.

Genome Biol ; 23(1): 126, 2022 06 10.

Article in English | MEDLINE | ID: mdl-35681170

ABSTRACT

Existing methods for computational prediction of transcription factor (TF) binding sites evaluate genomic regions with similarity to known TF sequence preferences. Most TF binding sites, however, do not resemble known TF sequence motifs, and many TFs are not sequence-specific. We developed Virtual ChIP-seq, which predicts binding of individual TFs in new cell types, integrating learned associations with gene expression and binding, TF binding sites from other cell types, and chromatin accessibility data in the new cell type. This approach outperforms methods that predict TF binding solely based on sequence preference, predicting binding for 36 TFs (MCC>0.3).

Subject(s)

Chromatin Immunoprecipitation Sequencing , Transcriptome , Binding Sites , Chromatin Immunoprecipitation , Protein Binding , Transcription Factors/metabolism

14.

Assessing and assuring interoperability of a genomics file format.

Niu, Yi Nian; Roberts, Eric G; Denisko, Danielle; Hoffman, Michael M.

Bioinformatics ; 38(13): 3327-3336, 2022 06 27.

Article in English | MEDLINE | ID: mdl-35575355

ABSTRACT

MOTIVATION: Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. RESULTS: We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases-potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software's performance on the test suite. AVAILABILITY AND IMPLEMENTATION: Acidbio is available at https://github.com/hoffmangroup/acidbio. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genomics , Software , Genomics/methods

15.

Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns.

Libbrecht, Maxwell W; Chan, Rachel C W; Hoffman, Michael M.

PLoS Comput Biol ; 17(10): e1009423, 2021 10.

Article in English | MEDLINE | ID: mdl-34648491

ABSTRACT

Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.

Subject(s)

Algorithms , Chromatin/genetics , Genome/genetics , Genomics/methods , Molecular Sequence Annotation/methods , Chromatin Immunoprecipitation Sequencing , Histone Code , Humans , Protein Binding

16.

Reproducibility standards for machine learning in the life sciences.

Heil, Benjamin J; Hoffman, Michael M; Markowetz, Florian; Lee, Su-In; Greene, Casey S; Hicks, Stephanie C.

Nat Methods ; 18(10): 1132-1135, 2021 10.

Article in English | MEDLINE | ID: mdl-34462593

Subject(s)

Computational Biology/methods , Computational Biology/standards , Machine Learning/standards , Reproducibility of Results , Software

17.

Tumor-Naïve Multimodal Profiling of Circulating Tumor DNA in Head and Neck Squamous Cell Carcinoma.

Burgener, Justin M; Zou, Jinfeng; Zhao, Zhen; Zheng, Yangqiao; Shen, Shu Yi; Huang, Shao Hui; Keshavarzi, Sareh; Xu, Wei; Liu, Fei-Fei; Liu, Geoffrey; Waldron, John N; Weinreb, Ilan; Spreafico, Anna; Siu, Lillian L; de Almeida, John R; Goldstein, David P; Hoffman, Michael M; De Carvalho, Daniel D; Bratman, Scott V.

Clin Cancer Res ; 27(15): 4230-4244, 2021 08 01.

Article in English | MEDLINE | ID: mdl-34158359

ABSTRACT

PURPOSE: Circulating tumor DNA (ctDNA) enables personalized treatment strategies in oncology by providing a noninvasive source of clinical biomarkers. In patients with low ctDNA abundance, tumor-naïve methods are needed to facilitate clinical implementation. Here, using locoregionally confined head and neck squamous cell carcinoma (HNSCC) as an example, we demonstrate tumor-naïve detection of ctDNA by simultaneous profiling of mutations and methylation. EXPERIMENTAL DESIGN: We conducted CAncer Personalized Profiling by deep Sequencing (CAPP-seq) and cell-free Methylated DNA ImmunoPrecipitation and high-throughput sequencing (cfMeDIP-seq) for detection of ctDNA-derived somatic mutations and aberrant methylation, respectively. We analyzed 77 plasma samples from 30 patients with stage I-IVA human papillomavirus-negative HNSCC as well as plasma samples from 20 risk-matched healthy controls. In addition, we analyzed leukocytes from patients and controls. RESULTS: CAPP-seq identified mutations in 20 of 30 patients at frequencies similar to that of The Tumor Genome Atlas (TCGA). Differential methylation analysis of cfMeDIP-seq profiles identified 941 ctDNA-derived hypermethylated regions enriched for CpG islands and HNSCC-specific methylation patterns. Both methods demonstrated an association between ctDNA abundance and shorter fragment lengths. In addition, mutation- and methylation-based ctDNA abundance was highly correlated (r > 0.85). Patients with detectable pretreatment ctDNA by both methods demonstrated significantly worse overall survival (HR = 7.5; P = 0.025) independent of clinical stage, with lack of ctDNA clearance post-treatment strongly correlating with recurrence. We further leveraged cfMeDIP-seq profiles to validate a prognostic signature identified from TCGA samples. CONCLUSIONS: Tumor-naïve detection of ctDNA by multimodal profiling may facilitate biomarker discovery and clinical use in low ctDNA abundance applications.

Subject(s)

Circulating Tumor DNA/blood , Circulating Tumor DNA/genetics , Head and Neck Neoplasms/blood , Head and Neck Neoplasms/genetics , Squamous Cell Carcinoma of Head and Neck/blood , Squamous Cell Carcinoma of Head and Neck/genetics , DNA Methylation , Humans , Mutation , Prospective Studies

18.

Sharing biological data: why, when, and how.

Wilson, Samantha L; Way, Gregory P; Bittremieux, Wout; Armache, Jean-Paul; Haendel, Melissa A; Hoffman, Michael M.

FEBS Lett ; 595(7): 847-863, 2021 04.

Article in English | MEDLINE | ID: mdl-33843054

Subject(s)

Data Systems , Information Dissemination , Microscopy/trends , Humans , Proteomics/trends

19.

Transparency and reproducibility in artificial intelligence.

Haibe-Kains, Benjamin; Adam, George Alexandru; Hosny, Ahmed; Khodakarami, Farnoosh; Waldron, Levi; Wang, Bo; McIntosh, Chris; Goldenberg, Anna; Kundaje, Anshul; Greene, Casey S; Broderick, Tamara; Hoffman, Michael M; Leek, Jeffrey T; Korthauer, Keegan; Huber, Wolfgang; Brazma, Alvis; Pineau, Joelle; Tibshirani, Robert; Hastie, Trevor; Ioannidis, John P A; Quackenbush, John; Aerts, Hugo J W L.

Nature ; 586(7829): E14-E16, 2020 10.

Article in English | MEDLINE | ID: mdl-33057217

Subject(s)

Algorithms , Artificial Intelligence , Reproducibility of Results

20.

A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types.

Libbrecht, Maxwell W; Rodriguez, Oscar L; Weng, Zhiping; Bilmes, Jeffrey A; Hoffman, Michael M; Noble, William Stafford.

Genome Biol ; 20(1): 180, 2019 08 28.

Article in English | MEDLINE | ID: mdl-31462275

ABSTRACT

Semi-automated genome annotation methods such as Segway take as input a set of genome-wide measurements such as of histone modification or DNA accessibility and output an annotation of genomic activity in the target cell type. Here we present annotations of 164 human cell types using 1615 data sets. To produce these annotations, we automated the label interpretation step to produce a fully automated annotation strategy. Using these annotations, we developed a measure of the importance of each genomic position called the "conservation-associated activity score." We further combined all annotations into a single, cell type-agnostic encyclopedia that catalogs all human regulatory elements.

Subject(s)

DNA/genetics , Databases, Genetic , Molecular Sequence Annotation , Algorithms , Automation , Cell Line , Humans , Machine Learning , Phenotype , Transcription, Genetic

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL