Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Mol Ther Nucleic Acids ; 35(2): 102202, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38846999

RESUMEN

Splicing factor 3b subunit 1 (SF3B1) is the largest subunit and core component of the spliceosome. Inhibition of SF3B1 was associated with an increase in broad intron retention (IR) on most transcripts, suggesting that IR can be used as a marker of spliceosome inhibition in chronic lymphocytic leukemia (CLL) cells. Furthermore, we separately analyzed exonic and intronic mapped reads on annotated RNA-sequencing transcripts obtained from B cells (n = 98 CLL patients) and healthy volunteers (n = 9). We measured intron/exon ratio to use that as a surrogate for alternative RNA splicing (ARS) and found that 66% of CLL-B cell transcripts had significant IR elevation compared with normal B cells (NBCs) and that correlated with mRNA downregulation and low expression levels. Transcripts with the highest IR levels belonged to biological pathways associated with gene expression and RNA splicing. A >2-fold increase of active pSF3B1 was observed in CLL-B cells compared with NBCs. Additionally, when the CLL-B cells were treated with macrolides (pladienolide-B), a significant decrease in pSF3B1, but not total SF3B1 protein, was observed. These findings suggest that IR/ARS is increased in CLL, which is associated with SF3B1 phosphorylation and susceptibility to SF3B1 inhibitors. These data provide additional support to the relevance of ARS in carcinogenesis and evidence of pSF3B1 participation in this process.

2.
bioRxiv ; 2024 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-38352549

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) provides unprecedented insights into cellular heterogeneity. Although scRNA-seq reads from most prevalent and popular tagged-end protocols are expected to arise from the 3' end of polyadenylated RNAs, recent studies have shown that "off-target" reads can constitute a substantial portion of the read population. In this work, we introduced scCensus, a comprehensive analysis workflow for systematically evaluating and categorizing off-target reads in scRNA-seq. We applied scCensus to seven scRNA-seq datasets. Our analysis of intergenic reads shows that these off-target reads contain information about chromatin structure and can be used to identify similar cells across modalities. Our analysis of antisense reads suggests that these reads can be used to improve gene detection and capture interesting transcriptional activities like antisense transcription. Furthermore, using splice-aware quantification, we find that spliced and unspliced reads provide distinct information about cell clusters and biomarkers, suggesting the utility of integrating signals from reads with different splicing statuses. Overall, our results suggest that off-target scRNA-seq reads contain underappreciated information about various transcriptional activities. These observations about yet-unexploited information in existing scRNA-seq data will help guide and motivate the community to improve current algorithms and analysis methods, and to develop novel approaches that utilize off-target reads to extend the reach and accuracy of single-cell data analysis pipelines.

3.
Hum Genet ; 2024 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-38170232

RESUMEN

Variants which disrupt splicing are a frequent cause of rare disease that have been under-ascertained clinically. Accurate and efficient methods to predict a variant's impact on splicing are needed to interpret the growing number of variants of unknown significance (VUS) identified by exome and genome sequencing. Here, we present the results of the CAGI6 Splicing VUS challenge, which invited predictions of the splicing impact of 56 variants ascertained clinically and functionally validated to determine splicing impact. The performance of 12 prediction methods, along with SpliceAI and CADD, was compared on the 56 functionally validated variants. The maximum accuracy achieved was 82% from two different approaches, one weighting SpliceAI scores by minor allele frequency, and one applying the recently published Splicing Prediction Pipeline (SPiP). SPiP performed optimally in terms of sensitivity, while an ensemble method combining multiple prediction tools and information from databases exceeded all others for specificity. Several challenge methods equalled or exceeded the performance of SpliceAI, with ultimate choice of prediction method likely to depend on experimental or clinical aims. One quarter of the variants were incorrectly predicted by at least 50% of the methods, highlighting the need for further improvements to splicing prediction methods for successful clinical application.

4.
Plant Physiol ; 193(2): 1016-1035, 2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37440715

RESUMEN

Belonging to Rosaceae, red raspberry (Rubus idaeus) and wild strawberry (Fragaria vesca) are closely related species with distinct fruit types. While the numerous ovaries become the juicy drupelet fruits in raspberry, their strawberry counterparts become dry and tasteless achenes. In contrast, while the strawberry receptacle, the stem tip, enlarges to become a red fruit, the raspberry receptacle shrinks and dries. The distinct fruit-forming ability of homologous organs in these 2 species allows us to investigate fruit type determination. We assembled and annotated the genome of red raspberry (R. idaeus) and characterized its fruit development morphologically and physiologically. Subsequently, transcriptomes of dissected and staged raspberry fruit tissues were compared to those of strawberry from a prior study. Class B MADS box gene expression was negatively associated with fruit-forming ability, which suggested a conserved inhibitory role of class B heterodimers, PISTILLATA/TM6 or PISTILLATA/APETALA3, for fruit formation. Additionally, the inability of strawberry ovaries to develop into fruit flesh was associated with highly expressed lignification genes and extensive lignification of the ovary pericarp. Finally, coexpressed gene clusters preferentially expressed in the dry strawberry achenes were enriched in "cell wall biosynthesis" and "ABA signaling," while coexpressed clusters preferentially expressed in the fleshy raspberry drupelets were enriched in "protein translation." Our work provides extensive genomic resources as well as several potential mechanisms underlying fruit type specification. These findings provide the framework for understanding the evolution of different fruit types, a defining feature of angiosperms.


Asunto(s)
Fragaria , Rubus , Rubus/genética , Frutas/metabolismo , Transcriptoma/genética , Genómica
5.
Genome Res ; 33(7): 1089-1100, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37316351

RESUMEN

Recent studies exploring the impact of methylation in tumor evolution suggest that although the methylation status of many of the CpG sites are preserved across distinct lineages, others are altered as the cancer progresses. Because changes in methylation status of a CpG site may be retained in mitosis, they could be used to infer the progression history of a tumor via single-cell lineage tree reconstruction. In this work, we introduce the first principled distance-based computational method, Sgootr, for inferring a tumor's single-cell methylation lineage tree and for jointly identifying lineage-informative CpG sites that harbor changes in methylation status that are retained along the lineage. We apply Sgootr on single-cell bisulfite-treated whole-genome sequencing data of multiregionally sampled tumor cells from nine metastatic colorectal cancer patients, as well as multiregionally sampled single-cell reduced-representation bisulfite sequencing data from a glioblastoma patient. We show that the tumor lineages constructed reveal a simple model underlying tumor progression and metastatic seeding. A comparison of Sgootr against alternative approaches shows that Sgootr can construct lineage trees with fewer migration events and with more in concordance with the sequential-progression model of tumor evolution, with a running time a fraction of that used in prior studies. Lineage-informative CpG sites identified by Sgootr are in inter-CpG island (CGI) regions, as opposed to intra-CGIs, which have been the main regions of interest in genomic methylation-related analyses.


Asunto(s)
Metilación de ADN , Neoplasias , Humanos , Metilación de ADN/genética , Sulfitos , Análisis de Secuencia de ADN/métodos , Genoma , Neoplasias/genética , Islas de CpG/genética
6.
Hortic Res ; 10(12): uhad240, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38162465

RESUMEN

Rosaceae is a large plant family consisting of many economically important fruit crops including peach, apple, pear, strawberry, raspberry, plum, and others. Investigations into their growth and development will promote both basic understanding and progress toward increasing fruit yield and quality. With the ever-increasing high-throughput sequencing data of Rosaceae, comparative studies are hindered by inconsistency of sample collection with regard to tissue, stage, growth conditions, and by vastly different handling of the data. Therefore, databases that enable easy access and effective utilization of directly comparable transcript data are highly desirable. Here, we describe a database for comparative analysis, ROsaceae Fruit Transcriptome database (ROFT), based on RNA-seq data generated from the same laboratory using similarly dissected and staged fruit tissues of four important Rosaceae fruit crops: apple, peach, strawberry, and red raspberry. Hence, the database is unique in allowing easy and robust comparisons among fruit gene expression across the four species. ROFT enables researchers to query orthologous genes and their expression patterns during different fruit developmental stages in the four species, identify tissue-specific and tissue-/stage-specific genes, visualize and compare ortholog expression in different fruit types, explore consensus co-expression networks, and download different data types. The database provides users access to vast amounts of RNA-seq data across the four economically important fruits, enables investigations of fruit type specification and evolution, and facilitates the selection of genes with critical roles in fruit development for further studies.

7.
Cancers (Basel) ; 14(23)2022 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-36497367

RESUMEN

Cancer occurs more frequently in men while autoimmune diseases (AIDs) occur more frequently in women. To explore whether these sex biases have a common basis, we collected 167 AID incidence studies from many countries for tissues that have both a cancer type and an AID that arise from that tissue. Analyzing a total of 182 country-specific, tissue-matched cancer-AID incidence rate sex bias data pairs, we find that, indeed, the sex biases observed in the incidence of AIDs and cancers that occur in the same tissue are positively correlated across human tissues. The common key factor whose levels across human tissues are most strongly associated with these incidence rate sex biases is the sex bias in the expression of the 37 genes encoded in the mitochondrial genome.

8.
Plant J ; 109(6): 1614-1629, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-34905278

RESUMEN

Fruits represent key evolutionary innovations in angiosperms and exhibit diverse types adapted for seed dissemination. However, the mechanisms that underlie fruit type diversity are not understood. The Rosaceae family comprises many different fruit types, including 'pome' and 'drupe' fruits, and hence is an excellent family for investigating the genetic basis of fruit type specification. Using comparative transcriptomics, we investigated the molecular events that correlate with pome (apple) and drupe (peach) fleshy fruit development, focusing on the earliest stages of fruit initiation. We identified PI and TM6, MADS box genes whose expression negatively correlates with fruit flesh-forming tissues irrespective of fruit type. In addition, the MADS box gene FBP9 is expressed in fruit-forming tissues in both species, and was lost multiple times in the genomes of dry-fruit-forming eudicots including Arabidopsis. Network analysis reveals co-expression between FBP9 and photosynthesis genes in both apple and peach, suggesting that FBP9 and photosynthesis may both promote fleshy fruit development. The large transcriptomic datasets at the earliest stages of pome and drupe fruit development provide rich resources for comparative studies, and the work provides important insights into fruit-type specification.


Asunto(s)
Malus , Prunus persica , Rosaceae , Frutas/metabolismo , Regulación de la Expresión Génica de las Plantas/genética , Malus/genética , Prunus persica/genética , Rosaceae/genética , Transcriptoma/genética
9.
Plant Commun ; 2(2): 100101, 2021 03 08.
Artículo en Inglés | MEDLINE | ID: mdl-33898973

RESUMEN

The most popular CRISPR-SpCas9 system recognizes canonical NGG protospacer adjacent motifs (PAMs). Previously engineered SpCas9 variants, such as Cas9-NG, favor G-rich PAMs in genome editing. In this manuscript, we describe a new plant genome-editing system based on a hybrid iSpyMacCas9 platform that allows for targeted mutagenesis, C to T base editing, and A to G base editing at A-rich PAMs. This study fills a major technology gap in the CRISPR-Cas9 system for editing NAAR PAMs in plants, which greatly expands the targeting scope of CRISPR-Cas9. Finally, our vector systems are fully compatible with Gateway cloning and will work with all existing single-guide RNA expression systems, facilitating easy adoption of the systems by others. We anticipate that more tools, such as prime editing, homology-directed repair, CRISPR interference, and CRISPR activation, will be further developed based on our promising iSpyMacCas9 platform.


Asunto(s)
Sistemas CRISPR-Cas , Edición Génica/métodos , Genoma de Planta , Oryza/genética , Triticum/genética , Zea mays/genética
10.
Nat Commun ; 12(1): 1944, 2021 03 29.
Artículo en Inglés | MEDLINE | ID: mdl-33782402

RESUMEN

CRISPR-Cas12a is a promising genome editing system for targeting AT-rich genomic regions. Comprehensive genome engineering requires simultaneous targeting of multiple genes at defined locations. Here, to expand the targeting scope of Cas12a, we screen nine Cas12a orthologs that have not been demonstrated in plants, and identify six, ErCas12a, Lb5Cas12a, BsCas12a, Mb2Cas12a, TsCas12a and MbCas12a, that possess high editing activity in rice. Among them, Mb2Cas12a stands out with high editing efficiency and tolerance to low temperature. An engineered Mb2Cas12a-RVRR variant enables editing with more relaxed PAM requirements in rice, yielding two times higher genome coverage than the wild type SpCas9. To enable large-scale genome engineering, we compare 12 multiplexed Cas12a systems and identify a potent system that exhibits nearly 100% biallelic editing efficiency with the ability to target as many as 16 sites in rice. This is the highest level of multiplex edits in plants to date using Cas12a. Two compact single transcript unit CRISPR-Cas12a interference systems are also developed for multi-gene repression in rice and Arabidopsis. This study greatly expands the targeting scope of Cas12a for crop genome engineering.


Asunto(s)
Arabidopsis/genética , Proteínas Bacterianas/genética , Proteínas Asociadas a CRISPR/genética , Sistemas CRISPR-Cas , Endodesoxirribonucleasas/genética , Edición Génica/métodos , Ingeniería Genética/métodos , Genoma de Planta , Oryza/genética , Agrobacterium tumefaciens , Alelos , Arabidopsis/metabolismo , Proteínas Bacterianas/metabolismo , Secuencia de Bases , Proteína 9 Asociada a CRISPR/genética , Proteína 9 Asociada a CRISPR/metabolismo , Proteínas Asociadas a CRISPR/metabolismo , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Productos Agrícolas , Endodesoxirribonucleasas/metabolismo , Humanos , Isoenzimas/genética , Isoenzimas/metabolismo , Oryza/metabolismo , Plantas Modificadas Genéticamente , ARN Guía de Kinetoplastida/genética , ARN Guía de Kinetoplastida/metabolismo , Alineación de Secuencia
11.
BMC Bioinformatics ; 20(1): 421, 2019 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-31409274

RESUMEN

BACKGROUND: Ultra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Unfortunately, these methods couple the tasks of pseudo-alignment and transcript quantification. This coupling precludes the direct usage of pseudo-alignment to other expression analyses, including alternative splicing or differential gene expression analysis, without including a non-essential transcript quantification step. RESULTS: In this paper, we introduce a transcriptome segmentation approach to decouple these two tasks. We propose an efficient algorithm to generate maximal disjoint segments given a transcriptome reference library on which ultra-fast pseudo-alignment can be used to produce per-sample segment counts. We show how to apply these maximally unambiguous count statistics in two specific expression analyses - alternative splicing and gene differential expression - without the need of a transcript quantification step. Our experiments based on simulated and experimental data showed that the use of segment counts, like other methods that rely on local coverage statistics, provides an advantage over approaches that rely on transcript quantification in detecting and correctly estimating local splicing in the case of incomplete transcript annotations. CONCLUSIONS: The transcriptome segmentation approach implemented in Yanagi exploits the computational and space efficiency of pseudo-alignment approaches. It significantly expands their applicability and interpretability in a variety of RNA-seq analyses by providing the means to model and capture local coverage variation in these analyses.


Asunto(s)
Algoritmos , Transcriptoma , Empalme Alternativo , Animales , Área Bajo la Curva , Drosophila/genética , Humanos , ARN/química , ARN/metabolismo , Curva ROC , Análisis de Secuencia de ARN
12.
Hum Mutat ; 40(9): 1215-1224, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31301154

RESUMEN

Precision medicine and sequence-based clinical diagnostics seek to predict disease risk or to identify causative variants from sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. In the past, few CAGI challenges have addressed the impact of sequence variants on splicing. In CAGI5, two challenges (Vex-seq and MaPSY) involved prediction of the effect of variants, primarily single-nucleotide changes, on splicing. Although there are significant differences between these two challenges, both involved prediction of results from high-throughput exon inclusion assays. Here, we discuss the methods used to predict the impact of these variants on splicing, their performance, strengths, and weaknesses, and prospects for predicting the impact of sequence variation on splicing and disease phenotypes.


Asunto(s)
Empalme Alternativo , Biología Computacional/métodos , Mutación , Proteínas/genética , Animales , Congresos como Asunto , Aptitud Genética , Humanos , Modelos Genéticos , Homología de Secuencia de Ácido Nucleico
13.
J Immunol ; 201(4): 1154-1164, 2018 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-29997126

RESUMEN

The uptake and destruction of bacteria by phagocytic cells is an essential defense mechanism in metazoans. To identify novel genes involved in the phagocytosis of Staphylococcus aureus, a major human pathogen, we assessed the phagocytic capacity of adult blood cells (hemocytes) of the fruit fly, Drosophila melanogaster, by testing several lines of the Drosophila Genetic Reference Panel. Natural genetic variation in the gene RNA-binding Fox protein 1 (Rbfox1) correlated with low phagocytic capacity in hemocytes, pointing to Rbfox1 as a candidate regulator of phagocytosis. Loss of Rbfox1 resulted in increased expression of the Ig superfamily member Down syndrome adhesion molecule 4 (Dscam4). Silencing of Dscam4 in Rbfox1-depleted blood cells rescued the fly's cellular immune response to S. aureus, indicating that downregulation of Dscam4 by Rbfox1 is critical for S. aureus phagocytosis in Drosophila To our knowledge, this study is the first to demonstrate a link between Rbfox1, Dscam4, and host defense against S. aureus.


Asunto(s)
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/inmunología , Hemocitos/inmunología , Inmunidad Celular , Factores de Empalme de ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Infecciones Estafilocócicas/inmunología , Staphylococcus aureus/fisiología , Animales , Moléculas de Adhesión Celular/genética , Moléculas de Adhesión Celular/metabolismo , Proteínas de Drosophila/genética , Técnicas de Inactivación de Genes , Humanos , Fagocitosis , Factores de Empalme de ARN/genética , Proteínas de Unión al ARN/genética , Infecciones Estafilocócicas/genética
14.
Plant Physiol ; 178(1): 202-216, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29991484

RESUMEN

The diploid strawberry, Fragaria vesca, is a developing model system for the economically important Rosaceae family. Strawberry fleshy fruit develops from the floral receptacle and its ripening is nonclimacteric. The external seed configuration of strawberry fruit facilitates the study of seed-to-fruit cross tissue communication, particularly phytohormone biosynthesis and transport. To investigate strawberry fruit development, we previously generated spatial and temporal transcriptome data profiling F. vesca flower and fruit development pre- and postfertilization. In this study, we combined 46 of our existing RNA-seq libraries to generate coexpression networks using the Weighted Gene Co-Expression Network Analysis package in R. We then applied a post-hoc consensus clustering approach and used bootstrapping to demonstrate consensus clustering's ability to produce robust and reproducible clusters. Further, we experimentally tested hypotheses based on the networks, including increased iron transport from the receptacle to the seed postfertilization and characterized a F. vesca floral mutant and its candidate gene. To increase their utility, the networks are presented in a web interface (www.fv.rosaceaefruits.org) for easy exploration and identification of coexpressed genes. Together, the work reported here illustrates ways to generate robust networks optimized for the mining of large transcriptome data sets, thereby providing a useful resource for hypothesis generation and experimental design in strawberry and related Rosaceae fruit crops.


Asunto(s)
Flores/genética , Fragaria/genética , Frutas/genética , Regulación del Desarrollo de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes , Secuencia de Aminoácidos , Análisis por Conglomerados , Flores/crecimiento & desarrollo , Fragaria/crecimiento & desarrollo , Frutas/crecimiento & desarrollo , Perfilación de la Expresión Génica/métodos , Ontología de Genes , Genes de Plantas/genética , Mutación , Proteínas de Plantas/genética , Homología de Secuencia de Aminoácido
15.
BMC Genomics ; 18(1): 772, 2017 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-29020934

RESUMEN

BACKGROUND: Regulation of pre-mRNA splicing diversifies protein products and affects many biological processes. Arabidopsis thaliana Serine/Arginine-rich 45 (SR45), regulates pre-mRNA splicing by interacting with other regulatory proteins and spliceosomal subunits. Although SR45 has orthologs in diverse eukaryotes, including human RNPS1, the sr45-1 null mutant is viable. Narrow flower petals and reduced seed formation suggest that SR45 regulates genes involved in diverse processes, including reproduction. To understand how SR45 is involved in the regulation of reproductive processes, we studied mRNA from the wild-type and sr45-1 inflorescences using RNA-seq, and identified SR45-bound RNAs by immunoprecipitation. RESULTS: Using a variety of bioinformatics tools, we identified a total of 358 SR45 differentially regulated (SDR) genes, 542 SR45-dependent alternative splicing (SAS) events, and 1812 SR45-associated RNAs (SARs). There is little overlap between SDR genes and SAS genes, and neither set of genes is enriched for flower or seed development. However, transcripts from reproductive process genes are significantly overrepresented in SARs. In exploring the fate of SARs, we found that a total of 81 SARs are subject to alternative splicing, while 14 of them are known Nonsense-Mediated Decay (NMD) targets. Motifs related to GGNGG are enriched both in SARs and near different types of SAS events, suggesting that SR45 recognizes this motif directly. Genes involved in plant defense are significantly over-represented among genes whose expression is suppressed by SR45, and sr45-1 plants do indeed show enhanced immunity. CONCLUSION: We find that SR45 is a suppressor of innate immunity. We find that a single motif (GGNGG) is highly enriched in both RNAs bound by SR45 and in sequences near SR45- dependent alternative splicing events in inflorescence tissue. We find that the alternative splicing events regulated by SR45 are enriched for this motif whether the effect of SR45 is activation or repression of the particular event. Thus, our data suggests that SR45 acts to control splice site choice in a way that defies simple categorization as an activator or repressor of splicing.


Asunto(s)
Arabidopsis/genética , Arabidopsis/inmunología , Perfilación de la Expresión Génica , Inmunidad Innata/genética , Empalme del ARN , Arabidopsis/microbiología , Flores/genética
16.
Genetics ; 204(1): 57-75, 2016 09.
Artículo en Inglés | MEDLINE | ID: mdl-27356612

RESUMEN

Close relatives can share large segments of their genome identical by descent (IBD) that can be identified in genome-wide polymorphism data sets. There are a range of methods to use these IBD segments to identify relatives and estimate their relationship. These methods have focused on sharing on the autosomes, as they provide a rich source of information about genealogical relationships. We hope to learn additional information about recent ancestry through shared IBD segments on the X chromosome, but currently lack the theoretical framework to use this information fully. Here, we fill this gap by developing probability distributions for the number and length of X chromosome segments shared IBD between an individual and an ancestor k generations back, as well as between half- and full-cousin relationships. Due to the inheritance pattern of the X and the fact that X homologous recombination occurs only in females (outside of the pseudoautosomal regions), the number of females along a genealogical lineage is a key quantity for understanding the number and length of the IBD segments shared among relatives. When inferring relationships among individuals, the number of female ancestors along a genealogical lineage will often be unknown. Therefore, our IBD segment length and number distributions marginalize over this unknown number of recombinational meioses through a distribution of recombinational meioses we derive. By using Bayes' theorem to invert these distributions, we can estimate the number of female ancestors between two relatives, giving us details about the genealogical relations between individuals not possible with autosomal data alone.


Asunto(s)
Cromosomas Humanos X , Patrón de Herencia , Teorema de Bayes , Cromosomas Humanos X/genética , Femenino , Genealogía y Heráldica , Variación Genética , Genética de Población/métodos , Genética de Población/estadística & datos numéricos , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Modelos Genéticos , Linaje
17.
Mol Biol Cell ; 26(20): 3557-60, 2015 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-26463979

RESUMEN

Thirty-five years ago, as young graduate students, we had the pleasure and privilege of being in Joan Steitz's laboratory at a pivotal point in the history of RNA molecular biology. Introns had recently been discovered in the laboratories of Philip Sharp and Richard Roberts, but the machinery for removing them from mRNA precursors was entirely unknown. This Retrospective describes our hypothesis that recently discovered snRNPs functioned in pre-mRNA splicing. The proposal was proven correct, as has Joan's intuition that small RNAs provide specificity to RNA processing reactions through base pairing in diverse settings. However, research over the intervening years has revealed that both splice site selection and splicing itself are much more complex and dynamic than we imagined.


Asunto(s)
Biología Molecular/historia , Ribonucleoproteínas Nucleares Pequeñas/genética , Ribonucleoproteínas Nucleares Pequeñas/historia , Animales , Aniversarios y Eventos Especiales , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Empalme del ARN/genética , ARN Mensajero/genética , Empalmosomas/genética
19.
BMC Bioinformatics ; 16: 218, 2015 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-26160651

RESUMEN

BACKGROUND: Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. RESULTS: All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. CONCLUSIONS: The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Biología Computacional/métodos , Cadenas de Markov , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Bases de Datos Factuales , Humanos , Datos de Secuencia Molecular , Proteínas/metabolismo , Homología de Secuencia de Aminoácido , Programas Informáticos
20.
BMC Genomics ; 16 Suppl 8: S4, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26110739

RESUMEN

BACKGROUND: There are now over 2000 loci in the human genome where genome wide association studies (GWAS) have found one or more SNPs to be associated with altered risk of a complex trait disease. At each of these loci, there must be some molecular level mechanism relevant to the disease. What are these mechanisms and how do they contribute to disease? RESULTS: Here we consider the roles of three primary mechanism classes: changes that directly alter protein function (missense SNPs), changes that alter transcript abundance as a consequence of variants close-by in sequence, and changes that affect splicing. Missense SNPs are divided into those predicted to have a high impact on in vivo protein function, and those with a low impact. Splicing is divided into SNPs with a direct impact on splice sites, and those with a predicted effect on auxiliary splicing signals. The analysis was based on associations found for seven complex trait diseases in the classic Wellcome Trust Case Control Consortium (WTCCC1) GWA study and subsequent studies and meta-analyses, collected from the GWAS catalog. Linkage disequilibrium information was used to identify possible candidate SNPs for involvement in disease mechanism in each of the 356 loci associated with these seven diseases. With the parameters used, we find that 76% of loci have at least of these mechanisms. Overall, except for the low incidence of direct impact on splice sites, the mechanisms are found at similar frequencies, with changes in transcript abundance the most common. But the distribution of mechanisms over diseases varies markedly, as does the fraction of loci with assigned mechanisms. Many of the implicated proteins have previously been suggested as relevant, but the specific mechanism assignments are new. In addition, a number of new disease relevant proteins are proposed. CONCLUSIONS: The high fraction of GWAS loci with proposed mechanisms suggests that these classes of mechanism play a major role. Other mechanism types, such as variants affecting expression of genes remote in the DNA sequence, will contribute in other loci. Each of the identified putative mechanisms provides a hypothesis for further investigation.


Asunto(s)
Expresión Génica , Estudio de Asociación del Genoma Completo , Enfermedades Metabólicas/genética , Mutación Missense , Polimorfismo de Nucleótido Simple , Empalme del ARN , Genotipo , Humanos , Fenotipo , Isoformas de Proteínas/genética , Sitios de Carácter Cuantitativo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA