Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 316
Filtrar
1.
Science ; 384(6698): eadh0829, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38781368

RESUMEN

Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.


Asunto(s)
Trastorno del Espectro Autista , Encéfalo , Estudio de Asociación del Genoma Completo , Isoformas de Proteínas , Sitios de Carácter Cuantitativo , Esquizofrenia , Humanos , Encéfalo/metabolismo , Encéfalo/crecimiento & desarrollo , Encéfalo/embriología , Esquizofrenia/genética , Trastorno del Espectro Autista/genética , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Transcriptoma , Empalme del ARN , Regulación del Desarrollo de la Expresión Génica , Empalme Alternativo , Atlas como Asunto , Redes Reguladoras de Genes
2.
Sci Adv ; 10(21): eadj4452, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38781344

RESUMEN

Most genetic variants associated with psychiatric disorders are located in noncoding regions of the genome. To investigate their functional implications, we integrate epigenetic data from the PsychENCODE Consortium and other published sources to construct a comprehensive atlas of candidate brain cis-regulatory elements. Using deep learning, we model these elements' sequence syntax and predict how binding sites for lineage-specific transcription factors contribute to cell type-specific gene regulation in various types of glia and neurons. The elements' evolutionary history suggests that new regulatory information in the brain emerges primarily via smaller sequence mutations within conserved mammalian elements rather than entirely new human- or primate-specific sequences. However, primate-specific candidate elements, particularly those active during fetal brain development and in excitatory neurons and astrocytes, are implicated in the heritability of brain-related human traits. Additionally, we introduce PsychSCREEN, a web-based platform offering interactive visualization of PsychENCODE-generated genetic and epigenetic data from diverse brain cell types in individuals with psychiatric disorders and healthy controls.


Asunto(s)
Encéfalo , Epigénesis Genética , Secuencias Reguladoras de Ácidos Nucleicos , Humanos , Encéfalo/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos/genética , Animales , Evolución Molecular , Trastornos Mentales/genética , Elementos Reguladores de la Transcripción/genética , Neuronas/metabolismo , Regulación de la Expresión Génica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
3.
bioRxiv ; 2023 Nov 13.
Artículo en Inglés | MEDLINE | ID: mdl-38014075

RESUMEN

Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and largescale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 elementgene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancerpromoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.

4.
bioRxiv ; 2023 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-37986960

RESUMEN

Aging brings dysregulation of various processes across organs and tissues, often stemming from stochastic damage to individual cells over time. Here, we used a combination of single-nucleus RNA-sequencing and single-cell whole-genome sequencing to identify transcriptomic and genomic changes in the prefrontal cortex of the human brain across life span, from infancy to centenarian. We identified infant-specific cell clusters enriched for the expression of neurodevelopmental genes, and a common down-regulation of cell-essential homeostatic genes that function in ribosomes, transport, and metabolism during aging across cell types. Conversely, expression of neuron-specific genes generally remains stable throughout life. We observed a decrease in specific DNA repair genes in aging, including genes implicated in generating brain somatic mutations as indicated by mutation signature analysis. Furthermore, we detected gene-length-specific somatic mutation rates that shape the transcriptomic landscape of the aged human brain. These findings elucidate critical aspects of human brain aging, shedding light on transcriptomic and genomics dynamics.

5.
Hepatol Commun ; 7(10)2023 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-37756045

RESUMEN

BACKGROUND: Genome-wide association studies (GWAS) have identified 30 risk loci for primary sclerosing cholangitis (PSC). Variants within these loci are found predominantly in noncoding regions of DNA making their mechanisms of conferring risk hard to define. Epigenomic studies have shown noncoding variants broadly impact regulatory element activity. The possible association of noncoding PSC variants with regulatory element activity has not been studied. We aimed to (1) determine if the noncoding risk variants in PSC impact regulatory element function and (2) if so, assess the role these regulatory elements have in explaining the genetic risk for PSC. METHODS: Available epigenomic datasets were integrated to build a comprehensive atlas of cell type-specific regulatory elements, emphasizing PSC-relevant cell types. RNA-seq and ATAC-seq were performed on peripheral CD4+ T cells from 10 PSC patients and 11 healthy controls. Computational techniques were used to (1) study the enrichment of PSC-risk variants within regulatory elements, (2) correlate risk genotype with differences in regulatory element activity, and (3) identify regulatory elements differentially active and genes differentially expressed between PSC patients and controls. RESULTS: Noncoding PSC-risk variants are strongly enriched within immune-specific enhancers, particularly ones involved in T-cell response to antigenic stimulation. In total, 250 genes and >10,000 regulatory elements were identified that are differentially active between patients and controls. CONCLUSIONS: Mechanistic effects are proposed for variants at 6 PSC-risk loci where genotype was linked with differential T-cell regulatory element activity. Regulatory elements are shown to play a key role in PSC pathophysiology.


Asunto(s)
Colangitis Esclerosante , Estudio de Asociación del Genoma Completo , Humanos , Colangitis Esclerosante/genética , Secuenciación de Inmunoprecipitación de Cromatina , Genotipo
6.
bioRxiv ; 2023 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-37546958

RESUMEN

From nematodes to placental mammals, key components of the germline transposon silencing piRNAs pathway localize to phase separated perinuclear granules. In Drosophila, the PIWI protein Aub, DEAD box protein Vasa and helicase Armi localize to nuage granules and are required for ping-pong piRNA amplification and phased piRNA processing. Drosophila piRNA mutants lead to genome instability and Chk2 kinase DNA damage signaling. By systematically analyzing piRNA pathway organization, small RNA production, and long RNA expression in single piRNA mutants and corresponding chk2/mnk double mutants, we show that Chk2 activation disrupts nuage localization of Aub and Vasa, and that the HP1 homolog Rhino, which drives piRNA precursor transcription, is required for Aub, Vasa, and Armi localization to nuage. However, these studies also show that ping-pong amplification and phased piRNA biogenesis are independent of nuage localization of Vasa, Aub and Armi. Dispersed cytoplasmic proteins thus appear to mediate these essential piRNA pathway functions.

7.
bioRxiv ; 2023 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-37292896

RESUMEN

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

8.
iScience ; 26(6): 106896, 2023 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-37332597

RESUMEN

Hidradenitis suppurativa (HS) is a skin disorder that causes chronic painful inflammation and hyperproliferation, often with the comorbidity of invasive keratoacanthoma (KA). Our research, employing high-resolution immunofluorescence and data science approaches together with confirmatory molecular analysis, has identified that the 5'-cap-dependent protein translation regulatory complex eIF4F is a key factor in the development of HS and is responsible for regulating follicular hyperproliferation. Specifically, eIF4F translational targets, Cyclin D1 and c-MYC, orchestrate the development of HS-associated KA. Although eIF4F and p-eIF4E are contiguous throughout HS lesions, Cyclin D1 and c-MYC have unique spatial localization and functions. The keratin-filled crater of KA is formed by nuclear c-MYC-induced differentiation of epithelial cells, whereas the co-localization of c-MYC and Cyclin D1 provides oncogenic transformation by activating RAS, PI3K, and ERK pathways. In sum, we have revealed a novel mechanism underlying HS pathogenesis of follicular hyperproliferation and the development of HS-associated invasive KA.

9.
Science ; 380(6643): eabn7930, 2023 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-37104580

RESUMEN

Understanding the regulatory landscape of the human genome is a long-standing objective of modern biology. Using the reference-free alignment across 241 mammalian genomes produced by the Zoonomia Consortium, we charted evolutionary trajectories for 0.92 million human candidate cis-regulatory elements (cCREs) and 15.6 million human transcription factor binding sites (TFBSs). We identified 439,461 cCREs and 2,024,062 TFBSs under evolutionary constraint. Genes near constrained elements perform fundamental cellular processes, whereas genes near primate-specific elements are involved in environmental interaction, including odor perception and immune response. About 20% of TFBSs are transposable element-derived and exhibit intricate patterns of gains and losses during primate evolution whereas sequence variants associated with complex traits are enriched in constrained TFBSs. Our annotations illuminate the regulatory functions of the human genome.


Asunto(s)
Evolución Molecular , Genoma Humano , Mamíferos , Elementos Reguladores de la Transcripción , Factores de Transcripción , Animales , Humanos , Sitios de Unión , Elementos Transponibles de ADN , Mamíferos/clasificación , Mamíferos/genética , Primates/clasificación , Primates/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Filogenia
10.
Science ; 380(6643): eabn2937, 2023 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-37104612

RESUMEN

Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.


Asunto(s)
Enfermedad , Variación Genética , Animales , Humanos , Evolución Biológica , Genoma Humano , Estudio de Asociación del Genoma Completo , Genómica , Anotación de Secuencia Molecular , Polimorfismo de Nucleótido Simple , Enfermedad/genética
11.
Hum Genet ; 142(8): 1091-1111, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-36935423

RESUMEN

Regulatory elements are the genomic regions that interact with transcription factors to control cell-type-specific gene expression in different cellular environments. A precise and complete catalog of functional elements encoded by the human genome is key to understanding mammalian gene regulation. Here, we review the current state of regulatory element annotation. We first provide an overview of assays for characterizing functional elements, including genome, epigenome, transcriptome, three-dimensional chromatin interaction, and functional validation assays. We then discuss computational methods for defining regulatory elements, including peak-calling and other statistical modeling methods. Finally, we introduce several high-quality lists of regulatory element annotations and suggest potential future directions.


Asunto(s)
Cromatina , Secuencias Reguladoras de Ácidos Nucleicos , Animales , Humanos , Secuencias Reguladoras de Ácidos Nucleicos/genética , Cromatina/genética , Regulación de la Expresión Génica , Genómica/métodos , Mamíferos/genética , Genoma Humano
12.
Neuron ; 111(9): 1381-1390.e6, 2023 05 03.
Artículo en Inglés | MEDLINE | ID: mdl-36931278

RESUMEN

GGGGCC repeat expansion in the C9ORF72 gene is the most common genetic cause of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). Repeat RNAs can be translated into dipeptide repeat proteins, including poly(GR), whose mechanisms of action remain largely unknown. In an RNA-seq analysis of poly(GR) toxicity in Drosophila, we found that several antimicrobial peptide genes, such as metchnikowin (Mtk), and heat shock protein (Hsp) genes are activated. Mtk knockdown in the fly eye or in all neurons suppresses poly(GR) neurotoxicity. These findings suggest a cell-autonomous role of Mtk in neurodegeneration. Hsp90 knockdown partially rescues both poly(GR) toxicity in flies and neurodegeneration in C9ORF72 motor neurons derived from induced pluripotent stem cells (iPSCs). Topoisomerase II (TopoII) regulates poly(GR)-induced upregulation of Hsp90 and Mtk. TopoII knockdown also suppresses poly(GR) toxicity in Drosophila and improves survival of C9ORF72 iPSC-derived motor neurons. These results suggest potential novel therapeutic targets for C9ORF72-ALS/FTD.


Asunto(s)
Esclerosis Amiotrófica Lateral , Demencia Frontotemporal , Animales , Esclerosis Amiotrófica Lateral/genética , Proteína C9orf72/genética , Proteína C9orf72/metabolismo , Dipéptidos/genética , Expansión de las Repeticiones de ADN , Regulación hacia Abajo , Drosophila/metabolismo , Demencia Frontotemporal/genética , Demencia Frontotemporal/metabolismo , Neuronas Motoras/metabolismo
13.
bioRxiv ; 2023 Mar 10.
Artículo en Inglés | MEDLINE | ID: mdl-36945512

RESUMEN

Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.

14.
medRxiv ; 2023 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-36945630

RESUMEN

Genomic regulatory elements active in the developing human brain are notably enriched in genetic risk for neuropsychiatric disorders, including autism spectrum disorder (ASD), schizophrenia, and bipolar disorder. However, prioritizing the specific risk genes and candidate molecular mechanisms underlying these genetic enrichments has been hindered by the lack of a single unified large-scale gene regulatory atlas of human brain development. Here, we uniformly process and systematically characterize gene, isoform, and splicing quantitative trait loci (xQTLs) in 672 fetal brain samples from unique subjects across multiple ancestral populations. We identify 15,752 genes harboring a significant xQTL and map 3,739 eQTLs to a specific cellular context. We observe a striking drop in gene expression and splicing heritability as the human brain develops. Isoform-level regulation, particularly in the second trimester, mediates the greatest proportion of heritability across multiple psychiatric GWAS, compared with eQTLs. Via colocalization and TWAS, we prioritize biological mechanisms for ~60% of GWAS loci across five neuropsychiatric disorders, nearly two-fold that observed in the adult brain. Finally, we build a comprehensive set of developmentally regulated gene and isoform co-expression networks capturing unique genetic enrichments across disorders. Together, this work provides a comprehensive view of genetic regulation across human brain development as well as the stage-and cell type-informed mechanistic underpinnings of neuropsychiatric disorders.

15.
Nucleic Acids Res ; 51(5): 2066-2086, 2023 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-36762470

RESUMEN

Transposons are mobile genetic elements prevalent in the genomes of most species. The distribution of transposons within a genome reflects the actions of two opposing processes: initial insertion site selection, and selective pressure from the host. By analyzing whole-genome sequencing data from transposon-activated Drosophila melanogaster, we identified 43 316 de novo and 237 germline insertions from four long-terminal-repeat (LTR) transposons, one LINE transposon (I-element), and one DNA transposon (P-element). We found that all transposon types favored insertion into promoters de novo, but otherwise displayed distinct insertion patterns. De novo and germline P-element insertions preferred replication origins, often landing in a narrow region around transcription start sites and in regions of high chromatin accessibility. De novo LTR transposon insertions preferred regions with high H3K36me3, promoters and exons of active genes; within genes, LTR insertion frequency correlated with gene expression. De novo I-element insertion density increased with distance from the centromere. Germline I-element and LTR transposon insertions were depleted in promoters and exons, suggesting strong selective pressure to remove transposons from functional elements. Transposon movement is associated with genome evolution and disease; therefore, our results can improve our understanding of genome and disease biology.


Asunto(s)
Elementos Transponibles de ADN , Drosophila melanogaster , Animales , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Elementos Transponibles de ADN/genética , Cromosomas , Secuencia de Bases , Epigénesis Genética
16.
Reproduction ; 165(2): 183-196, 2023 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-36395073

RESUMEN

In brief: The testis-specific transcription factor, TCFL5, expressed in pachytene spermatocytes regulates the meiotic gene expression program in collaboration with the transcription factor A-MYB. Abstract: In male mice, the transcription factors STRA8 and MEISON initiate meiosis I. We report that STRA8/MEISON activates the transcription factors A-MYB and TCFL5, which together reprogram gene expression after spermatogonia enter into meiosis. TCFL5 promotes the transcription of genes required for meiosis, mRNA turnover, miR-34/449 production, meiotic exit, and spermiogenesis. This transcriptional architecture is conserved in rhesus macaque, suggesting TCFL5 plays a central role in meiosis and spermiogenesis in placental mammals. Tcfl5em1/em1 mutants are sterile, and spermatogenesis arrests at the mid- or late-pachytene stage of meiosis. Moreover, Tcfl5+/em1 mutants produce fewer motile sperm.


Asunto(s)
Placenta , Factores de Transcripción , Animales , Femenino , Masculino , Ratones , Embarazo , Macaca mulatta/metabolismo , Mamíferos/metabolismo , Meiosis , Placenta/metabolismo , Semen/metabolismo , Espermatocitos/metabolismo , Espermatogénesis/genética , Testículo/metabolismo , Factores de Transcripción/metabolismo
17.
Nucleic Acids Res ; 51(D1): D1300-D1311, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350676

RESUMEN

Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.


Asunto(s)
Genoma Humano , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Genómica , Genotipo , Variación Genética
18.
RNA ; 2022 Oct 14.
Artículo en Inglés | MEDLINE | ID: mdl-36241367

RESUMEN

In male mice, the transcription factor A MYB initiates the transcription of pachytene piRNA genes during meiosis. Here, we report that A MYB activates the transcription factor Tcfl5 produced in pachytene spermatocytes. Subsequently, A MYB and TCFL5 reciprocally reinforce their own transcription to establish a positive feedback circuit that triggers pachytene piRNA production. TCFL5 regulates the expression of genes required for piRNA maturation and promotes transcription of evolutionarily young pachytene piRNA genes, whereas A-MYB activates the transcription of older pachytene piRNA genes. Intriguingly, pachytene piRNAs from TCFL5-dependent young loci initiates the production of piRNAs from A-MYB-dependent older loci ensuring the self-propagation of pachytene piRNAs. A MYB and TCFL5 act via a set of incoherent feedforward loops that drive regulation of gene expression by pachytene piRNAs during spermatogenesis. This regulatory architecture is conserved in rhesus macaque, suggesting that it was present in the last common ancestor of placental mammals.

19.
Noncoding RNA ; 8(5)2022 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-36287122

RESUMEN

Long noncoding RNAs (lncRNAs) play critical regulatory roles in human development and disease. Although there are over 100,000 samples with available RNA sequencing (RNA-seq) data, many lncRNAs have yet to be annotated. The conventional approach to identifying novel lncRNAs from RNA-seq data is to find transcripts without coding potential but this approach has a false discovery rate of 30-75%. Other existing methods either identify only multi-exon lncRNAs, missing single-exon lncRNAs, or require transcriptional initiation profiling data (such as H3K4me3 ChIP-seq data), which is unavailable for many samples with RNA-seq data. Because of these limitations, current methods cannot accurately identify novel lncRNAs from existing RNA-seq data. To address this problem, we have developed software, Flnc, to accurately identify both novel and annotated full-length lncRNAs, including single-exon lncRNAs, directly from RNA-seq data without requiring transcriptional initiation profiles. Flnc integrates machine learning models built by incorporating four types of features: transcript length, promoter signature, multiple exons, and genomic location. Flnc achieves state-of-the-art prediction power with an AUROC score over 0.92. Flnc significantly improves the prediction accuracy from less than 50% using the conventional approach to over 85%. Flnc is available via GitHub platform.

20.
Hum Mol Genet ; 31(R1): R114-R122, 2022 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-36083269

RESUMEN

Every cell in the human body inherits a copy of the same genetic information. The three billion base pairs of DNA in the human genome, and the roughly 50 000 coding and non-coding genes they contain, must thus encode all the complexity of human development and cell and tissue type diversity. Differences in gene regulation, or the modulation of gene expression, enable individual cells to interpret the genome differently to carry out their specific functions. Here we discuss recent and ongoing efforts to build gene regulatory maps, which aim to characterize the regulatory roles of all sequences in a genome. Many researchers and consortia have identified such regulatory elements using functional assays and evolutionary analyses; we discuss the results, strengths and shortcomings of their approaches. We also discuss new techniques the field can leverage and emerging challenges it will face while striving to build gene regulatory maps of ever-increasing resolution and comprehensiveness.


Asunto(s)
Regulación de la Expresión Génica , Secuencias Reguladoras de Ácidos Nucleicos , Humanos , Regulación de la Expresión Génica/genética , Genoma Humano/genética , Mapeo Cromosómico , ADN/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA