RESUMO
The Encyclopedia of DNA Elements (ENCODE) project has established a genomic resource for mammalian development, profiling a diverse panel of mouse tissues at 8 developmental stages from 10.5 days after conception until birth, including transcriptomes, methylomes and chromatin states. Here we systematically examined the state and accessibility of chromatin in the developing mouse fetus. In total we performed 1,128 chromatin immunoprecipitation with sequencing (ChIP-seq) assays for histone modifications and 132 assay for transposase-accessible chromatin using sequencing (ATAC-seq) assays for chromatin accessibility across 72 distinct tissue-stages. We used integrative analysis to develop a unified set of chromatin state annotations, infer the identities of dynamic enhancers and key transcriptional regulators, and characterize the relationship between chromatin state and accessibility during developmental gene regulation. We also leveraged these data to link enhancers to putative target genes and demonstrate tissue-specific enrichments of sequence variants associated with disease in humans. The mouse ENCODE data sets provide a compendium of resources for biomedical researchers and achieve, to our knowledge, the most comprehensive view of chromatin dynamics during mammalian fetal development to date.
Assuntos
Cromatina/genética , Cromatina/metabolismo , Conjuntos de Dados como Assunto , Desenvolvimento Fetal/genética , Histonas/metabolismo , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Cromatina/química , Sequenciamento de Cromatina por Imunoprecipitação , Doença/genética , Elementos Facilitadores Genéticos/genética , Feminino , Regulação da Expressão Gênica no Desenvolvimento/genética , Variação Genética , Histonas/química , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Especificidade de Órgãos/genética , Reprodutibilidade dos Testes , Transposases/metabolismoRESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
BACKGROUND: Commonly used approaches for genomic investigation of bacterial outbreaks, including SNP and gene-by-gene approaches, are limited by the requirement for background genomes and curated allele schemes, respectively. As a result, they only work on a select subset of known organisms, and fail on novel or less studied pathogens. We introduce refMLST, a gene-by-gene approach using the reference genome of a bacterium to form a scalable, reproducible and robust method to perform outbreak investigation. RESULTS: When applied to multiple outbreak causing bacteria including 1263 Salmonella enterica, 331 Yersinia enterocolitica and 6526 Campylobacter jejuni genomes, refMLST enabled consistent clustering, improved resolution, and faster processing in comparison to commonly used tools like chewieSnake. CONCLUSIONS: refMLST is a novel multilocus sequence typing approach that is applicable to any bacterial species with a public reference genome, does not require a curated scheme, and automatically accounts for genetic recombination. AVAILABILITY AND IMPLEMENTATION: refMLST is freely available for academic use at https://bugseq.com/academic .
Assuntos
Técnicas de Tipagem Bacteriana , Tipagem de Sequências Multilocus , Tipagem de Sequências Multilocus/métodos , Técnicas de Tipagem Bacteriana/métodos , Genoma Bacteriano/genética , Salmonella enterica/genética , Salmonella enterica/classificação , Campylobacter jejuni/genética , Campylobacter jejuni/classificação , Surtos de Doenças , Yersinia enterocolitica/genética , Yersinia enterocolitica/classificação , SoftwareRESUMO
Signals of natural selection can be quickly eroded in high gene flow systems, curtailing efforts to understand how and when genetic adaptation occurs in the ocean. This long-standing, unresolved topic in ecology and evolution has renewed importance because changing environmental conditions are driving range expansions that may necessitate rapid evolutionary responses. One example occurs in Kellet's whelk (Kelletia kelletii), a common subtidal gastropod with an ~40- to 60-day pelagic larval duration that expanded their biogeographic range northwards in the 1970s by over 300 km. To test for genetic adaptation, we performed a series of experimental crosses with Kellet's whelk adults collected from their historical (HxH) and recently expanded range (ExE), and conducted RNA-Seq on offspring that we reared in a common garden environment. We identified 2770 differentially expressed genes (DEGs) between 54 offspring samples with either only historical range (HxH offspring) or expanded range (ExE offspring) ancestry. Using SNPs called directly from the DEGs, we assigned samples of known origin back to their range of origin with unprecedented accuracy for a marine species (92.6% and 94.5% for HxH and ExE offspring, respectively). The SNP with the highest predictive importance occurred on triosephosphate isomerase (TPI), an essential metabolic enzyme involved in cold stress response. TPI was significantly upregulated and contained a non-synonymous mutation in the expanded range. Our findings pave the way for accurately identifying patterns of dispersal, gene flow and population connectivity in the ocean by demonstrating that experimental transcriptomics can reveal mechanisms for how marine organisms respond to changing environmental conditions.
RESUMO
Transcriptome-wide maps of RNA binding protein (RBP)-RNA interactions by immunoprecipitation (IP)-based methods such as RNA IP (RIP) and crosslinking and IP (CLIP) are key starting points for evaluating the molecular roles of the thousands of human RBPs. A significant bottleneck to the application of these methods in diverse cell lines, tissues, and developmental stages is the availability of validated IP-quality antibodies. Using IP followed by immunoblot assays, we have developed a validated repository of 438 commercially available antibodies that interrogate 365 unique RBPs. In parallel, 362 short-hairpin RNA (shRNA) constructs against 276 unique RBPs were also used to confirm specificity of these antibodies. These antibodies can characterize subcellular RBP localization. With the burgeoning interest in the roles of RBPs in cancer, neurobiology, and development, these resources are invaluable to the broad scientific community. Detailed information about these resources is publicly available at the ENCODE portal (https://www.encodeproject.org/).
Assuntos
Bases de Dados Genéticas , Proteínas de Ligação a RNA/genética , RNA/metabolismo , Transcriptoma/genética , Sítios de Ligação , Humanos , Ligação Proteica , RNA/genética , RNA Interferente Pequeno/classificação , RNA Interferente Pequeno/genética , Proteínas de Ligação a RNA/metabolismoRESUMO
ATP synthase, H+ transporting, mitochondrial F1 complex, δ subunit (ATP5F1D; formerly ATP5D) is a subunit of mitochondrial ATP synthase and plays an important role in coupling proton translocation and ATP production. Here, we describe two individuals, each with homozygous missense variants in ATP5F1D, who presented with episodic lethargy, metabolic acidosis, 3-methylglutaconic aciduria, and hyperammonemia. Subject 1, homozygous for c.245C>T (p.Pro82Leu), presented with recurrent metabolic decompensation starting in the neonatal period, and subject 2, homozygous for c.317T>G (p.Val106Gly), presented with acute encephalopathy in childhood. Cultured skin fibroblasts from these individuals exhibited impaired assembly of F1FO ATP synthase and subsequent reduced complex V activity. Cells from subject 1 also exhibited a significant decrease in mitochondrial cristae. Knockdown of Drosophila ATPsynδ, the ATP5F1D homolog, in developing eyes and brains caused a near complete loss of the fly head, a phenotype that was fully rescued by wild-type human ATP5F1D. In contrast, expression of the ATP5F1D c.245C>T and c.317T>G variants rescued the head-size phenotype but recapitulated the eye and antennae defects seen in other genetic models of mitochondrial oxidative phosphorylation deficiency. Our data establish c.245C>T (p.Pro82Leu) and c.317T>G (p.Val106Gly) in ATP5F1D as pathogenic variants leading to a Mendelian mitochondrial disease featuring episodic metabolic decompensation.
Assuntos
Alelos , Doenças Metabólicas/genética , ATPases Mitocondriais Próton-Translocadoras/genética , Mutação/genética , Subunidades Proteicas/genética , Sequência de Aminoácidos , Sequência de Bases , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Recém-Nascido , Mutação com Perda de Função/genética , Masculino , Mitocôndrias/metabolismo , Mitocôndrias/ultraestrutura , ATPases Mitocondriais Próton-Translocadoras/química , Subunidades Proteicas/químicaRESUMO
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.
Assuntos
DNA/genética , Bases de Dados Genéticas , Componentes do Gene , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Metadados , Animais , Caenorhabditis elegans/genética , Apresentação de Dados , Conjuntos de Dados como Assunto , Drosophila melanogaster/genética , Previsões , Genoma Humano , Humanos , Camundongos/genética , Interface Usuário-ComputadorRESUMO
We discuss a challenging case of a 58-year-old Vietnamese-American woman who presented to her new primary care provider with an 8-year history of slowly progressive dysphagia, hoarseness, muscle weakness with associated frequent falls, and weight loss. She eventually reported dry eyes and dry mouth, and she was diagnosed with Sjogren's syndrome. Subsequently, she was additionally diagnosed with inclusion body myositis and gastric light-chain (AL) amyloidosis. Although inclusion body myositis has been previously associated with Sjogren's syndrome, inclusion body myositis is rare in non-Caucasians, and the trio of Sjogren's syndrome, inclusion body myositis, and AL amyloidosis has not been previously reported. Sjogren's syndrome is a systemic autoimmune condition characterized by ocular and oral dryness. It is one of the most common rheumatologic disorders in the USA and worldwide. Early diagnosis of Sjogren's is particularly important given the frequency and variety of associated autoimmune diseases and extraglandular manifestations. Furthermore, although inclusion body myositis has a low prevalence, it is the most common inflammatory myopathy in older adults and is unfortunately associated with long delays in diagnosis, so knowledge of this disorder is also crucial for practicing internists.
Assuntos
Amiloidose de Cadeia Leve de Imunoglobulina/complicações , Amiloidose de Cadeia Leve de Imunoglobulina/diagnóstico , Miosite de Corpos de Inclusão/complicações , Miosite de Corpos de Inclusão/diagnóstico , Síndrome de Sjogren/complicações , Síndrome de Sjogren/diagnóstico , Feminino , Humanos , Pessoa de Meia-IdadeRESUMO
BACKGROUND: Despite growing evidence of diagnostic yield and clinical utility of whole exome sequencing (WES) in patients with undiagnosed diseases, there remain significant cost and reimbursement barriers limiting access to such testing. The diagnostic yield and resulting clinical actions of WES for patients who previously faced insurance coverage barriers have not yet been explored. METHODS: We performed a retrospective descriptive analysis of clinical WES outcomes for patients facing insurance coverage barriers prior to clinical WES and who subsequently enrolled in the Undiagnosed Diseases Network (UDN). Clinical WES was completed as a result of participation in the UDN. Payer type, molecular diagnostic yield, and resulting clinical actions were evaluated. RESULTS: Sixty-six patients in the UDN faced insurance coverage barriers to WES at the time of enrollment (67% public payer, 26% private payer). Forty-two of 66 (64%) received insurance denial for clinician-ordered WES, 19/66 (29%) had health insurance through a payer known not to cover WES, and 5/66 (8%) had previous payer denial of other genetic tests. Clinical WES results yielded a molecular diagnosis in 23 of 66 patients (35% [78% pediatric, 65% neurologic indication]). Molecular diagnosis resulted in clinical actions in 14 of 23 patients (61%). CONCLUSIONS: These data demonstrate that a substantial proportion of patients who encountered insurance coverage barriers to WES had a clinically actionable molecular diagnosis, supporting the notion that WES has value as a covered benefit for patients who remain undiagnosed despite objective clinical findings.
Assuntos
Sequenciamento do Exoma , Cobertura do Seguro , Doenças não Diagnosticadas/genética , Criança , Pré-Escolar , Feminino , Testes Genéticos/métodos , Humanos , Masculino , Estudos Retrospectivos , Estados UnidosRESUMO
The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.
Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica , Animais , DNA/metabolismo , Genes , Humanos , Camundongos , Proteínas/metabolismo , RNA/metabolismoRESUMO
A great many cell types are necessary for the myriad capabilities of complex, multicellular organisms. One interesting aspect of this diversity of cell type is that many cells in diploid organisms are polyploid. This is called endopolyploidy and arises from cell cycles that are often characterized as "variant," but in fact are widespread throughout nature. Endopolyploidy is essential for normal development and physiology in many different organisms. Here we review how both plants and animals use variations of the cell cycle, termed collectively as endoreplication, resulting in polyploid cells that support specific aspects of development. In addition, we discuss briefly how endoreplication occurs in response to certain physiological stresses, and how it may contribute to the development of cancer. Finally, we describe the molecular mechanisms that support the onset and progression of endoreplication.
Assuntos
Ciclo Celular/fisiologia , Replicação do DNA/fisiologia , Poliploidia , Animais , Ciclo Celular/genética , Diferenciação Celular , Proliferação de Células , Replicação do DNA/genética , Humanos , Neoplasias/patologia , Células Vegetais , Desenvolvimento Vegetal , Estresse Fisiológico/fisiologiaRESUMO
Precise control of cell cycle regulators is critical for normal development and tissue homeostasis. E2F transcription factors are activated during G1 to drive the G1-S transition and are then inhibited during S phase by a variety of mechanisms. Here, we genetically manipulate the single Drosophila activator E2F (E2f1) to explore the developmental requirement for S phase-coupled E2F down-regulation. Expression of an E2f1 mutant that is not destroyed during S phase drives cell cycle progression and causes apoptosis. Interestingly, this apoptosis is not exclusively the result of inappropriate cell cycle progression, because a stable E2f1 mutant that cannot function as a transcription factor or drive cell cycle progression also triggers apoptosis. This observation suggests that the inappropriate presence of E2f1 protein during S phase can trigger apoptosis by mechanisms that are independent of E2F acting directly at target genes. The ability of S phase-stabilized E2f1 to trigger apoptosis requires an interaction between E2f1 and the Drosophila pRb homolog, Rbf1, and involves induction of the pro-apoptotic gene, hid. Simultaneously blocking E2f1 destruction during S phase and inhibiting the induction of apoptosis results in tissue overgrowth and lethality. We propose that inappropriate accumulation of E2f1 protein during S phase triggers the elimination of potentially hyperplastic cells via apoptosis in order to ensure normal development of rapidly proliferating tissues.
Assuntos
Drosophila melanogaster/metabolismo , Fator de Transcrição E2F1/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Homeostase/genética , Larva/metabolismo , Animais , Apoptose/genética , Proliferação de Células , DNA/biossíntese , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Fator de Transcrição E2F1/genética , Fase G1/genética , Larva/genética , Mutação , Neuropeptídeos/genética , Neuropeptídeos/metabolismo , Proteólise , Proteína do Retinoblastoma , Fase S/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
Microbiomes have gained significant attention in ecological research, owing to their diverse interactions and essential roles within different organismal ecosystems. Microorganisms, such as bacteria, archaea, and viruses, have profound impact on host health, influencing digestion, metabolism, immune function, tissue development, and behavior. This study investigates the microbiome diversity and function of Kellet's whelk (Kelletia kelletii) perivitelline fluid (PVF), which sustains thousands of developing K. kelletii embryos within a polysaccharide and protein matrix. Our core microbiome analysis reveals a diverse range of bacteria, with the Roseobacter genus being the most abundant. Additionally, genes related to host-microbe interactions, symbiosis, and quorum sensing were detected, indicating a potential symbiotic relationship between the microbiome and Kellet's whelk embryos. Furthermore, the microbiome exhibits gene expression related to antibiotic biosynthesis, suggesting a defensive role against pathogenic bacteria and potential discovery of novel antibiotics. Overall, this study sheds light on the microbiome's role in Kellet's whelk development, emphasizing the significance of host-microbe interactions in vulnerable life history stages. To our knowledge, ours is the first study to use 16S sequencing coupled with RNA sequencing (RNA-seq) to profile the microbiome of an invertebrate PVF.IMPORTANCEThis study provides novel insight to an encapsulated system with strong evidence of symbiosis between the microbial inhabitants and developing host embryos. The Kellet's whelk perivitelline fluid (PVF) contains microbial organisms of interest that may be providing symbiotic functions and potential antimicrobial properties during this vulnerable life history stage. This study, the first to utilize a comprehensive approach to investigating Kellet's whelk PVF microbiome, couples 16S rRNA gene long-read sequencing with RNA-seq. This research contributes to and expands our knowledge on the roles of beneficial host-associated microbes.
RESUMO
Next-generation sequencing technologies, such as Nanopore MinION, Illumina Hiseq and Novaseq, and PacBio Sequel II, hold immense potential for advancing genomic research on non-model organisms, including the vast majority of marine species. However, application of these technologies to marine invertebrate species is often impeded by challenges in extracting and purifying their genomic DNA due to high polysaccharide content and other secondary metabolites. In this study, we help resolve this issue by developing and testing DNA extraction protocols for Kellet's whelk (Kelletia kelletii), a subtidal gastropod with ecological and commercial importance, by comparing four DNA extraction methods commonly used in marine invertebrate studies. In our comparison of extraction methods, the Salting Out protocol was the least expensive, produced the highest DNA yields, produced consistent high DNA quality, and had low toxicity. We validated the protocol using an independent set of tissue samples, then applied it to extract high-molecular-weight (HMW) DNA from over three thousand Kellet's whelk tissue samples. The protocol demonstrated scalability and, with added clean-up, suitability for RAD-seq, GT-seq, as well as whole genome sequencing using both long read (ONT MinION) and short read (Illumina NovaSeq) sequencing platforms. Our findings offer a robust and versatile DNA extraction and clean-up protocol for supporting genomic research on non-model marine organisms, to help mediate the under-representation of invertebrates in genomic studies.
Assuntos
Gastrópodes , Animais , Gastrópodes/genética , Genoma/genética , Genômica , DNA/genética , Análise de Sequência de DNA/métodosRESUMO
Ascidians have the potential to reveal fundamental biological insights related to coloniality, regeneration, immune function, and the evolution of these traits. This study implements a hybrid assembly technique to produce a genome assembly and annotation for the botryllid ascidian, Botrylloides violaceus. A hybrid genome assembly was produced using Illumina, Inc. short and Oxford Nanopore Technologies long-read sequencing technologies. The resulting assembly is comprised of 831 contigs, has a total length of 121 Mbp, N50 of 1 Mbp, and a BUSCO score of 96.1%. Genome annotation identified 13 K protein-coding genes. Comparative genomic analysis with other tunicates reveals patterns of conservation and divergence within orthologous gene families even among closely related species. Characterization of the Wnt gene family, encoding signaling ligands involved in development and regeneration, reveals conserved patterns of subfamily presence and gene copy number among botryllids. This supports the use of genomic data from nonmodel organisms in the investigation of biological phenomena.
Assuntos
Urocordados , Animais , Urocordados/genética , Genômica/métodos , Genoma , Dosagem de Genes , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência MolecularRESUMO
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
RESUMO
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
RESUMO
Cyclic AMP has a crucial role during the entire developmental program of the social amoebae Dictyostelium, acting both as an intracellular second messenger and, when secreted, as a directional cue that is relayed to neighboring cells during chemotaxis. Although significant knowledge about cAMP production in chemotaxing cells has been derived from studies performed on cell populations, cAMP dynamics at the single cell level have not been investigated. To examine this, we used a FRET-based cAMP sensor that possesses high cAMP sensitivity and great temporal resolution. We show the transient profile of cAMP accumulation in live Dictyostelium cells and establish that chemoattractants control intracellular cAMP dynamics by regulating synthesis via the adenylyl cyclase ACA. aca(-) cells show no significant change in FRET response following chemoattractant addition. Furthermore, cells lacking ACB, the other adenylyl cyclase expressed in chemotaxing cells, behave similarly to wild-type cells. We also establish that the RegA is the major phosphodiesterase that degrades intracellular cAMP in chemotaxis-competent cells. Interestingly, we failed to measure intracellular cAMP compartmentalization in actively chemotaxing cells. We conclude that cytosolic cAMP, which is destined to activate PKA, is regulated by ACA and RegA and does not compartmentalize during chemotaxis.
Assuntos
AMP Cíclico/metabolismo , Dictyostelium/citologia , Dictyostelium/metabolismo , Adenilil Ciclases/genética , Adenilil Ciclases/metabolismo , Quimiotaxia , Citosol/metabolismo , Dictyostelium/enzimologia , Dictyostelium/genética , Transferência Ressonante de Energia de Fluorescência , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismoRESUMO
Deep learning neural networks have improved performance in many cancer informatics problems, including breast cancer subtype classification. However, many networks experience underspecificationwheremultiplecombinationsofparametersachievesimilarperformance, bothin training and validation. Additionally, certain parameter combinations may perform poorly when the test distribution differs from the training distribution. Embedding prior knowledge from the literature may address this issue by boosting predictive models that provide crucial, in-depth information about a given disease. Breast cancer research provides a wealth of such knowledge, particularly in the form of subtype biomarkers and genetic signatures. In this study, we draw on past research on breast cancer subtype biomarkers, label propagation, and neural graph machines to present a novel methodology for embedding knowledge into machine learning systems. We embed prior knowledge into the loss function in the form of inter-subject distances derived from a well-known published breast cancer signature. Our results show that this methodology reduces predictor variability on state-of-the-art deep learning architectures and increases predictor consistency leading to improved interpretation. We find that pathway enrichment analysis is more consistent after embedding knowledge. This novel method applies to a broad range of existing studies and predictive models. Our method moves the traditional synthesis of predictive models from an arbitrary assignment of weights to genes toward a more biologically meaningful approach of incorporating knowledge.