Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Biol Open ; 12(10)2023 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-37815090

RESUMO

Genetic variants affecting Heterogeneous Nuclear Ribonucleoprotein U (HNRNPU) have been identified in several neurodevelopmental disorders (NDDs). HNRNPU is widely expressed in the human brain and shows the highest postnatal expression in the cerebellum. Recent studies have investigated the role of HNRNPU in cerebral cortical development, but the effects of HNRNPU deficiency on cerebellar development remain unknown. Here, we describe the molecular and cellular outcomes of HNRNPU locus deficiency during in vitro neural differentiation of patient-derived and isogenic neuroepithelial stem cells with a hindbrain profile. We demonstrate that HNRNPU deficiency leads to chromatin remodeling of A/B compartments, and transcriptional rewiring, partly by impacting exon inclusion during mRNA processing. Genomic regions affected by the chromatin restructuring and host genes of exon usage differences show a strong enrichment for genes implicated in epilepsies, intellectual disability, and autism. Lastly, we show that at the cellular level HNRNPU downregulation leads to an increased fraction of neural progenitors in the maturing neuronal population. We conclude that the HNRNPU locus is involved in delayed commitment of neural progenitors to differentiate in cell types with hindbrain profile.


Assuntos
Ribonucleoproteínas Nucleares Heterogêneas Grupo U , Transtornos do Neurodesenvolvimento , Humanos , Cromatina , Ribonucleoproteínas Nucleares Heterogêneas Grupo U/genética , Ribonucleoproteínas Nucleares Heterogêneas Grupo U/metabolismo , Transtornos do Neurodesenvolvimento/genética , Neurogênese/genética , Rombencéfalo/metabolismo
2.
Sci Adv ; 9(30): eadg1805, 2023 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-37506213

RESUMO

Posttranscriptional modifications of mRNA have emerged as regulators of gene expression. Although pseudouridylation is the most abundant, its biological role remains poorly understood. Here, we demonstrate that the pseudouridine synthase dyskerin associates with RNA polymerase II, binds to thousands of mRNAs, and is responsible for their pseudouridylation, an action that occurs in chromatin and does not appear to require a guide RNA with full complementarity. In cells lacking dyskerin, mRNA pseudouridylation is reduced, while at the same time, de novo protein synthesis is enhanced, indicating that this modification interferes with translation. Accordingly, mRNAs with fewer pseudouridines due to knockdown of dyskerin are translated more efficiently. Moreover, mRNA pseudouridylation is severely reduced in patients with dyskeratosis congenita caused by inherited mutations in the gene encoding dyskerin (i.e., DKC1). Our findings demonstrate that pseudouridylation by dyskerin modulates mRNA translatability, with important implications for both normal development and disease.


Assuntos
Proteínas Nucleares , Proteínas de Ligação a RNA , Humanos , RNA Mensageiro/genética , Proteínas de Ligação a RNA/genética , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Proteínas de Ciclo Celular/metabolismo
3.
Cell ; 185(16): 3025-3040.e6, 2022 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-35882231

RESUMO

Non-allelic recombination between homologous repetitive elements contributes to evolution and human genetic disorders. Here, we combine short- and long-DNA read sequencing of repeat elements with a new bioinformatics pipeline to show that somatic recombination of Alu and L1 elements is widespread in the human genome. Our analysis uncovers tissue-specific non-allelic homologous recombination hallmarks; moreover, we find that centromeres and cancer-associated genes are enriched for retroelements that may act as recombination hotspots. We compare recombination profiles in human-induced pluripotent stem cells and differentiated neurons and find that the neuron-specific recombination of repeat elements accompanies chromatin changes during cell-fate determination. Finally, we report that somatic recombination profiles are altered in Parkinson's and Alzheimer's disease, suggesting a link between retroelement recombination and genomic instability in neurodegeneration. This work highlights a significant contribution of the somatic recombination of repeat elements to genomic diversity in health and disease.


Assuntos
Genoma Humano , Retroelementos , Elementos Alu/genética , Recombinação Homóloga , Humanos , Elementos Nucleotídeos Longos e Dispersos , Sequências Repetitivas de Ácido Nucleico
4.
Sci Data ; 9(1): 400, 2022 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-35821502

RESUMO

Endogenous DNA double-strand breaks (DSBs) occurring in neural cells have been implicated in the pathogenesis of neurodevelopmental disorders (NDDs). Currently, a genomic map of endogenous DSBs arising during human neurogenesis is missing. Here, we applied in-suspension Breaks Labeling In Situ and Sequencing (sBLISS), RNA-Seq, and Hi-C to chart the genomic landscape of DSBs and relate it to gene expression and genome architecture in 2D cultures of human neuroepithelial stem cells (NES), neural progenitor cells (NPC), and post-mitotic neural cells (NEU). Endogenous DSBs were enriched at the promoter and along the gene body of transcriptionally active genes, at the borders of topologically associating domains (TADs), and around chromatin loop anchors. NDD risk genes harbored significantly more DSBs in comparison to other protein-coding genes, especially in NEU cells. We provide sBLISS, RNA-Seq, and Hi-C datasets for each differentiation stage, and all the scripts needed to reproduce our analyses. Our datasets and tools represent a unique resource that can be harnessed to investigate the role of genome fragility in the pathogenesis of NDDs.


Assuntos
Quebras de DNA de Cadeia Dupla , Neurogênese , Linhagem Celular Tumoral , DNA/metabolismo , Genômica , Humanos
5.
Plants (Basel) ; 10(12)2021 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-34961135

RESUMO

Dehydration proteins (dehydrins, DHNs) confer tolerance to water-stress deficit in plants. We performed a comparative genomics and evolutionary study of DHN genes in four model Brachypodium grass species. Due to limited knowledge on dehydrin expression under water deprivation stress in Brachypodium, we also performed a drought-induced gene expression analysis in 32 ecotypes of the genus' flagship species B. distachyon showing different hydric requirements. Genomic sequence analysis detected 10 types of dehydrin genes (Bdhn) across the Brachypodium species. Domain and conserved motif contents of peptides encoded by Bdhn genes revealed eight protein architectures. Bdhn genes were spread across several chromosomes. Selection analysis indicated that all the Bdhn genes were constrained by purifying selection. Three upstream cis-regulatory motifs (BES1, MYB124, ZAT) were detected in several Bdhn genes. Gene expression analysis demonstrated that only four Bdhn1-Bdhn2, Bdhn3, and Bdhn7 genes, orthologs of wheat, barley, rice, sorghum, and maize genes, were expressed in mature leaves of B. distachyon and that all of them were more highly expressed in plants under drought conditions. Brachypodium dehydrin expression was significantly correlated with drought-response phenotypic traits (plant biomass, leaf carbon and proline contents and water use efficiency increases, and leaf water and nitrogen content decreases) being more pronounced in drought-tolerant ecotypes. Our results indicate that dehydrin type and regulation could be a key factor determining the acquisition of water-stress tolerance in grasses.

6.
Front Oncol ; 11: 700568, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34395272

RESUMO

Somatic copy number alterations (SCNAs) are a pervasive trait of human cancers that contributes to tumorigenesis by affecting the dosage of multiple genes at the same time. In the past decade, The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) initiatives have generated and made publicly available SCNA genomic profiles from thousands of tumor samples across multiple cancer types. Here, we present a comprehensive analysis of 853,218 SCNAs across 10,729 tumor samples belonging to 32 cancer types using TCGA data. We then discuss current models for how SCNAs likely arise during carcinogenesis and how genomic SCNA profiles can inform clinical practice. Lastly, we highlight open questions in the field of cancer-associated SCNAs.

8.
Genome Biol ; 22(1): 136, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-33952325

RESUMO

BACKGROUND: Eukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Transcription is not restricted to regions with annotated gene features but includes almost any genomic context. Currently, the source and function of most RNAs originating from intergenic regions in the human genome remain unclear. RESULTS: We hypothesize that many intergenic RNAs can be ascribed to the presence of as-yet unannotated genes or the "fuzzy" transcription of known genes that extends beyond the annotated boundaries. To elucidate the contributions of these two sources, we assemble a dataset of more than 2.5 billion publicly available RNA-seq reads across 5 human cell lines and multiple cellular compartments to annotate transcriptional units in the human genome. About 80% of transcripts from unannotated intergenic regions can be attributed to the fuzzy transcription of existing genes; the remaining transcripts originate mainly from putative long non-coding RNA loci that are rarely spliced. We validate the transcriptional activity of these intergenic RNAs using independent measurements, including transcriptional start sites, chromatin signatures, and genomic occupancies of RNA polymerase II in various phosphorylation states. We also analyze the nuclear localization and sensitivities of intergenic transcripts to nucleases to illustrate that they tend to be rapidly degraded either on-chromatin by XRN2 or off-chromatin by the exosome. CONCLUSIONS: We provide a curated atlas of intergenic RNAs that distinguishes between alternative processing of well-annotated genes from independent transcriptional units based on the combined analysis of chromatin signatures, nuclear RNA localization, and degradation pathways.


Assuntos
DNA Intergênico/genética , Genes , RNA Mensageiro/genética , Linhagem Celular , Cromatina/genética , Endonucleases/metabolismo , Humanos , RNA Mensageiro/metabolismo , Transcrição Gênica
9.
Front Genet ; 12: 618659, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33603776

RESUMO

New High-Performance Computing architectures have been recently developed for commercial central processing unit (CPU). Yet, that has not improved the execution time of widely used bioinformatics applications, like BLAST+. This is due to a lack of optimization between the bases of the existing algorithms and the internals of the hardware that allows taking full advantage of the available CPU cores. To optimize the new architectures, algorithms must be revised and redesigned; usually rewritten from scratch. BLVector adapts the high-level concepts of BLAST+ to the x86 architectures with AVX-512, to harness their capabilities. A deep comprehensive study has been carried out to optimize the approach, with a significant reduction in time execution. BLVector reduces the execution time of BLAST+ when aligning up to mid-size protein sequences (∼750 amino acids). The gain in real scenario cases is 3.2-fold. When applied to longer proteins, BLVector consumes more time than BLAST+, but retrieves a much larger set of results. BLVector and BLAST+ are fine-tuned heuristics. Therefore, the relevant results returned by both are the same, although they behave differently specially when performing alignments with low scores. Hence, they can be considered complementary bioinformatics tools.

10.
Nat Protoc ; 15(12): 3894-3941, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33139954

RESUMO

sBLISS (in-suspension breaks labeling in situ and sequencing) is a versatile and widely applicable method for identification of endogenous and induced DNA double-strand breaks (DSBs) in any cell type that can be brought into suspension. sBLISS provides genome-wide profiles of the most consequential DNA lesion implicated in a variety of pathological, but also physiological, processes. In sBLISS, after in situ labeling, DSB ends are linearly amplified, followed by next-generation sequencing and DSB landscape analysis. Here, we present a step-by-step experimental protocol for sBLISS, as well as a basic computational analysis. The main advantages of sBLISS are (i) the suspension setup, which renders the protocol user-friendly and easily scalable; (ii) the possibility of adapting it to a high-throughput or single-cell workflow; and (iii) its flexibility and its applicability to virtually every cell type, including patient-derived cells, organoids, and isolated nuclei. The wet-lab protocol can be completed in 1.5 weeks and is suitable for researchers with intermediate expertise in molecular biology and genomics. For the computational analyses, basic-to-intermediate bioinformatics expertise is required.


Assuntos
Quebras de DNA de Cadeia Dupla , Genômica/métodos , Sequência de Bases , Linhagem Celular , Suspensões
11.
Nat Biotechnol ; 38(10): 1184-1193, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32451505

RESUMO

With the exception of lamina-associated domains, the radial organization of chromatin in mammalian cells remains largely unexplored. Here we describe genomic loci positioning by sequencing (GPSeq), a genome-wide method for inferring distances to the nuclear lamina all along the nuclear radius. GPSeq relies on gradual restriction digestion of chromatin from the nuclear lamina toward the nucleus center, followed by sequencing of the generated cut sites. Using GPSeq, we mapped the radial organization of the human genome at 100-kb resolution, which revealed radial patterns of genomic and epigenomic features and gene expression, as well as A and B subcompartments. By combining radial information with chromosome contact frequencies measured by Hi-C, we substantially improved the accuracy of whole-genome structure modeling. Finally, we charted the radial topography of DNA double-strand breaks, germline variants and cancer mutations and found that they have distinctive radial arrangements in A and B subcompartments. We conclude that GPSeq can reveal fundamental aspects of genome architecture.


Assuntos
Núcleo Celular/genética , Cromatina/genética , Epigenômica , Genoma Humano/genética , Regulação da Expressão Gênica/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
12.
Nat Commun ; 11(1): 1018, 2020 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-32094342

RESUMO

Mammalian genomes encode tens of thousands of noncoding RNAs. Most noncoding transcripts exhibit nuclear localization and several have been shown to play a role in the regulation of gene expression and chromatin remodeling. To investigate the function of such RNAs, methods to massively map the genomic interacting sites of multiple transcripts have been developed; however, these methods have some limitations. Here, we introduce RNA And DNA Interacting Complexes Ligated and sequenced (RADICL-seq), a technology that maps genome-wide RNA-chromatin interactions in intact nuclei. RADICL-seq is a proximity ligation-based methodology that reduces the bias for nascent transcription, while increasing genomic coverage and unique mapping rate efficiency compared with existing methods. RADICL-seq identifies distinct patterns of genome occupancy for different classes of transcripts as well as cell type-specific RNA-chromatin interactions, and highlights the role of transcription in the establishment of chromatin structure.


Assuntos
Cromatina/metabolismo , Mapeamento Cromossômico/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA não Traduzido/genética , Análise de Sequência de RNA/métodos , Animais , Linhagem Celular , Núcleo Celular/genética , Núcleo Celular/metabolismo , Cromatina/genética , Montagem e Desmontagem da Cromatina/genética , Biblioteca Gênica , Camundongos , Células-Tronco Embrionárias Murinas , RNA não Traduzido/metabolismo , Transcrição Gênica
13.
Nat Commun ; 10(1): 1636, 2019 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-30967549

RESUMO

DNA fluorescence in situ hybridization (DNA FISH) is a powerful method to study chromosomal organization in single cells. At present, there is a lack of free resources of DNA FISH probes and probe design tools which can be readily applied. Here, we describe iFISH, an open-source repository currently comprising 380 DNA FISH probes targeting multiple loci on the human autosomes and chromosome X, as well as a genome-wide database of optimally designed oligonucleotides and a freely accessible web interface ( http://ifish4u.org ) that can be used to design DNA FISH probes. We individually validate 153 probes and take advantage of our probe repository to quantify the extent of intermingling between multiple heterologous chromosome pairs, showing a much higher extent of intermingling in human embryonic stem cells compared to fibroblasts. In conclusion, iFISH is a versatile and expandable resource, which can greatly facilitate the use of DNA FISH in research and diagnostics.


Assuntos
Sondas de DNA/genética , Bases de Dados de Ácidos Nucleicos , Genoma Humano/genética , Hibridização in Situ Fluorescente/métodos , Células A549 , Mapeamento Cromossômico/métodos , Cromossomos Humanos/genética , Fibroblastos , Células-Tronco Embrionárias Humanas , Humanos , Oligonucleotídeos/genética , Reação em Cadeia da Polimerase em Tempo Real/métodos , Projetos de Pesquisa
14.
Cell ; 174(5): 1067-1081.e17, 2018 08 23.
Artigo em Inglês | MEDLINE | ID: mdl-30078707

RESUMO

Long mammalian introns make it challenging for the RNA processing machinery to identify exons accurately. We find that LINE-derived sequences (LINEs) contribute to this selection by recruiting dozens of RNA-binding proteins (RBPs) to introns. This includes MATR3, which promotes binding of PTBP1 to multivalent binding sites within LINEs. Both RBPs repress splicing and 3' end processing within and around LINEs. Notably, repressive RBPs preferentially bind to evolutionarily young LINEs, which are located far from exons. These RBPs insulate the LINEs and the surrounding intronic regions from RNA processing. Upon evolutionary divergence, changes in RNA motifs within LINEs lead to gradual loss of their insulation. Hence, older LINEs are located closer to exons, are a common source of tissue-specific exons, and increasingly bind to RBPs that enhance RNA processing. Thus, LINEs are hubs for the assembly of repressive RBPs and also contribute to the evolution of new, lineage-specific transcripts in mammals. VIDEO ABSTRACT.


Assuntos
Ribonucleoproteínas Nucleares Heterogêneas/química , Elementos Nucleotídeos Longos e Dispersos , Proteínas Associadas à Matriz Nuclear/química , Poliadenilação , Proteína de Ligação a Regiões Ricas em Polipirimidinas/química , Proteínas de Ligação a RNA/química , RNA/química , Processamento Alternativo , Animais , Sítios de Ligação , Éxons , Células HeLa , Humanos , Íntrons , Camundongos , Mutação , Motivos de Nucleotídeos , Filogenia , Ligação Proteica , Mapeamento de Interação de Proteínas , Splicing de RNA
16.
BMC Genomics ; 15: 925, 2014 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-25341390

RESUMO

BACKGROUND: The large amount of data produced by high-throughput sequencing poses new computational challenges. In the last decade, several tools have been developed for the identification of transcription and splicing factor binding sites. RESULTS: Here, we introduce the SeAMotE (Sequence Analysis of Motifs Enrichment) algorithm for discovery of regulatory regions in nucleic acid sequences. SeAMotE provides (i) a robust analysis of high-throughput sequence sets, (ii) a motif search based on pattern occurrences and (iii) an easy-to-use web-server interface. We applied our method to recently published data including 351 chromatin immunoprecipitation (ChIP) and 13 crosslinking immunoprecipitation (CLIP) experiments and compared our results with those of other well-established motif discovery tools. SeAMotE shows an average accuracy of 80% in finding discriminative motifs and outperforms other methods available in literature. CONCLUSIONS: SeAMotE is a fast, accurate and flexible algorithm for the identification of sequence patterns involved in protein-DNA and protein-RNA recognition. The server can be freely accessed at http://s.tartaglialab.com/new_submission/seamote.


Assuntos
Software , Algoritmos , Sequência de Bases , Imunoprecipitação da Cromatina , DNA/química , DNA/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Ligação Proteica , Proteínas/química , Proteínas/metabolismo , RNA/química , RNA/metabolismo , Análise de Sequência de DNA , Interface Usuário-Computador
17.
Bioinformatics ; 30(20): 2975-7, 2014 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-24990610

RESUMO

SUMMARY: Here we introduce ccSOL omics, a webserver for large-scale calculations of protein solubility. Our method allows (i) proteome-wide predictions; (ii) identification of soluble fragments within each sequences; (iii) exhaustive single-point mutation analysis. RESULTS: Using coil/disorder, hydrophobicity, hydrophilicity, ß-sheet and α-helix propensities, we built a predictor of protein solubility. Our approach shows an accuracy of 79% on the training set (36 990 Target Track entries). Validation on three independent sets indicates that ccSOL omics discriminates soluble and insoluble proteins with an accuracy of 74% on 31 760 proteins sharing <30% sequence similarity. AVAILABILITY AND IMPLEMENTATION: ccSOL omics can be freely accessed on the web at http://s.tartaglialab.com/page/ccsol_group. Documentation and tutorial are available at http://s.tartaglialab.com/static_files/shared/tutorial_ccsol_omics.html. CONTACT: gian.tartaglia@crg.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Internet , Proteômica/métodos , Algoritmos , Expressão Gênica , Interações Hidrofóbicas e Hidrofílicas , Estrutura Secundária de Proteína , Solubilidade
18.
Biochim Biophys Acta ; 1844(9): 1662-74, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24982029

RESUMO

Urease, the most efficient enzyme so far discovered, depends on the presence of nickel ions in the catalytic site for its activity. The transformation of inactive apo-urease into active holo-urease requires the insertion of two Ni(II) ions in the substrate binding site, a process that involves the interaction of four accessory proteins named UreD, UreF, UreG and UreE. This study, carried out using calorimetric and NMR-based structural analysis, is focused on the interaction between UreE and UreG from Sporosarcina pasteurii, a highly ureolytic bacterium. Isothermal calorimetric protein-protein titrations revealed the occurrence of a binding event between SpUreE and SpUreG, entailing two independent steps with positive cooperativity (Kd1=42±9µM; Kd2=1.7±0.3µM). This was interpreted as indicating the formation of the (UreE)2(UreG)2 hetero-oligomer upon binding of two UreG monomers onto the pre-formed UreE dimer. The molecular details of this interaction were elucidated using high-resolution NMR spectroscopy. The occurrence of SpUreE chemical shift perturbations upon addition of SpUreG was investigated and analyzed to establish the protein-protein interaction site. The latter appears to involve the Ni(II) binding site as well as mobile portions on the C-terminal and the N-terminal domains. Docking calculations based on the information obtained from NMR provided a structural basis for the protein-protein contact site. The high sequence and structural similarity within these protein classes suggests a generality of the interaction mode among homologous proteins. The implications of these results on the molecular details of the urease activation process are considered and analyzed.


Assuntos
Proteínas de Bactérias/química , Proteínas de Transporte/química , Níquel/química , Sporosarcina/química , Urease/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Calorimetria , Proteínas de Transporte/genética , Proteínas de Transporte/metabolismo , Cátions Bivalentes , Escherichia coli/genética , Escherichia coli/metabolismo , Expressão Gênica , Cinética , Espectroscopia de Ressonância Magnética , Simulação de Acoplamento Molecular , Níquel/metabolismo , Proteínas de Ligação a Fosfato , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Multimerização Proteica , Estrutura Secundária de Proteína , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Sporosarcina/enzimologia , Termodinâmica , Urease/genética , Urease/metabolismo
19.
Mol Biosyst ; 10(7): 1632-42, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24756571

RESUMO

Coding and non-coding RNAs associate with proteins to perform important functions in the cell. Protein-RNA complexes are essential components of the ribosomal and spliceosomal machinery; they are involved in epigenetic regulation and form non-membrane-bound aggregates known as granules. Despite the functional importance of ribonucleoprotein interactions, the precise mechanisms of macromolecular recognition are still poorly understood. Here, we present the latest developments in experimental and computational investigation of protein-RNA interactions. We compare performances of different algorithms and discuss how predictive models allow the large-scale investigation of ribonucleoprotein associations. Specifically, we focus on approaches to decipher mechanisms regulating the activity of transcripts in protein networks. Finally, the catRAPID omics express method is introduced for the analysis of protein-RNA expression networks.


Assuntos
Algoritmos , Ribonucleoproteínas/metabolismo , Biologia Computacional , Modelos Moleculares , RNA/química , Ribonucleoproteínas/química
20.
Bioinformatics ; 30(11): 1601-8, 2014 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-24493033

RESUMO

MOTIVATION: The recent shift towards high-throughput screening is posing new challenges for the interpretation of experimental results. Here we propose the cleverSuite approach for large-scale characterization of protein groups. DESCRIPTION: The central part of the cleverSuite is the cleverMachine (CM), an algorithm that performs statistics on protein sequences by comparing their physico-chemical propensities. The second element is called cleverClassifier and builds on top of the models generated by the CM to allow classification of new datasets. RESULTS: We applied the cleverSuite to predict secondary structure properties, solubility, chaperone requirements and RNA-binding abilities. Using cross-validation and independent datasets, the cleverSuite reproduces experimental findings with great accuracy and provides models that can be used for future investigations. AVAILABILITY: The intuitive interface for dataset exploration, analysis and prediction is available at http://s.tartaglialab.com/clever_suite.


Assuntos
Chaperonas Moleculares/química , Proteínas/química , Proteínas de Ligação a RNA/química , Software , Algoritmos , Proteínas Intrinsicamente Desordenadas/química , Chaperonas Moleculares/metabolismo , Estrutura Secundária de Proteína , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de Proteína , Solubilidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...