Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Bioinformatics ; 36(Suppl_2): i745-i753, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381824

RESUMO

MOTIVATION: Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra. RESULTS: We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms. AVAILABILITYAND IMPLEMENTATION: https://github.com/shawn-peng/FDR-estimation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Bases de Dados de Proteínas , Células HeLa , Humanos , Peptídeos
2.
BMC Biol ; 16(1): 4, 2018 01 10.
Artigo em Inglês | MEDLINE | ID: mdl-29325558

RESUMO

BACKGROUND: Transcription factors (TFs), the key players in transcriptional regulation, have attracted great experimental attention, yet the functions of most human TFs remain poorly understood. Recent capabilities in genome-wide protein binding profiling have stimulated systematic studies of the hierarchical organization of human gene regulatory network and DNA-binding specificity of TFs, shedding light on combinatorial gene regulation. We show here that these data also enable a systematic annotation of the biological functions and functional diversity of TFs. RESULT: We compiled a human gene regulatory network for 384 TFs covering the 146,096 TF-target gene (TF-TG) relationships, extracted from over 850 ChIP-seq experiments as well as the literature. By integrating this network of TF-TF and TF-TG relationships with 3715 functional concepts from six sources of gene function annotations, we obtained over 9000 confident functional annotations for 279 TFs. We observe extensive connectivity between TFs and Mendelian diseases, GWAS phenotypes, and pharmacogenetic pathways. Further, we show that TFs link apparently unrelated functions, even when the two functions do not share common genes. Finally, we analyze the pleiotropic functions of TFs and suggest that the increased number of upstream regulators contributes to the functional pleiotropy of TFs. CONCLUSION: Our computational approach is complementary to focused experimental studies on TF functions, and the resulting knowledge can guide experimental design for the discovery of unknown roles of TFs in human disease and drug response.


Assuntos
Bases de Dados Genéticas , Redes Reguladoras de Genes/genética , Predisposição Genética para Doença/genética , Fatores de Transcrição/genética , Humanos , Fatores de Transcrição/biossíntese
3.
Mol Syst Biol ; 13(2): 913, 2017 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-28193641

RESUMO

The low costs of array-synthesized oligonucleotide libraries are empowering rapid advances in quantitative and synthetic biology. However, high synthesis error rates, uneven representation, and lack of access to individual oligonucleotides limit the true potential of these libraries. We have developed a cost-effective method called Recombinase Directed Indexing (REDI), which involves integration of a complex library into yeast, site-specific recombination to index library DNA, and next-generation sequencing to identify desired clones. We used REDI to generate a library of ~3,300 DNA probes that exhibited > 96% purity and remarkable uniformity (> 95% of probes within twofold of the median abundance). Additionally, we created a collection of ~9,000 individually accessible CRISPR interference yeast strains for > 99% of genes required for either fermentative or respiratory growth, demonstrating the utility of REDI for rapid and cost-effective creation of strain collections from oligonucleotide pools. Our approach is adaptable to any complex DNA library, and fundamentally changes how these libraries can be parsed, maintained, propagated, and characterized.


Assuntos
Análise de Sequência de DNA/métodos , Leveduras/genética , Sistemas CRISPR-Cas , Biologia Computacional/métodos , DNA Fúngico/genética , Biblioteca Gênica
4.
Fungal Genet Biol ; 89: 18-28, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26808821

RESUMO

Microorganisms produce a wide range of natural products (NPs) with clinically and agriculturally relevant biological activities. In bacteria and fungi, genes encoding successive steps in a biosynthetic pathway tend to be clustered on the chromosome as biosynthetic gene clusters (BGCs). Historically, "activity-guided" approaches to NP discovery have focused on bioactivity screening of NPs produced by culturable microbes. In contrast, recent "genome mining" approaches first identify candidate BGCs, express these biosynthetic genes using synthetic biology methods, and finally test for the production of NPs. Fungal genome mining efforts and the exploration of novel sequence and NP space are limited, however, by the lack of a comprehensive catalog of BGCs encoding experimentally-validated products. In this study, we generated a comprehensive reference set of fungal NPs whose biosynthetic gene clusters are described in the published literature. To generate this dataset, we first identified NCBI records that included both a peer-reviewed article and an associated nucleotide record. We filtered these records by text and homology criteria to identify putative NP-related articles and BGCs. Next, we manually curated the resulting articles, chemical structures, and protein sequences. The resulting catalog contains 197 unique NP compounds covering several major classes of fungal NPs, including polyketides, non-ribosomal peptides, terpenoids, and alkaloids. The distribution of articles published per compound shows a bias toward the study of certain popular compounds, such as the aflatoxins. Phylogenetic analysis of biosynthetic genes suggests that much chemical and enzymatic diversity remains to be discovered in fungi. Our catalog was incorporated into the recently launched Minimum Information about Biosynthetic Gene cluster (MIBiG) repository to create the largest known set of fungal BGCs and associated NPs, a resource that we anticipate will guide future genome mining and synthetic biology efforts toward discovering novel fungal enzymes and metabolites.


Assuntos
Produtos Biológicos , Vias Biossintéticas/genética , Genes Fúngicos , Genoma Fúngico , Família Multigênica , Alcaloides , Sequência de Aminoácidos , Biologia Computacional , Curadoria de Dados , Fungos/genética , Filogenia , Policetídeos , Terpenos
5.
BMC Bioinformatics ; 13 Suppl 16: S4, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23176300

RESUMO

Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas/estatística & dados numéricos , Proteínas/química , Proteômica/estatística & dados numéricos , Cromatografia Líquida , Peptídeos/química , Proteoma/química , Espectrometria de Massas em Tandem/estatística & dados numéricos
6.
Bioinformatics ; 26(16): 1975-82, 2010 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-20551136

RESUMO

MOTIVATION: Enzyme catalysis is involved in numerous biological processes and the disruption of enzymatic activity has been implicated in human disease. Despite this, various aspects of catalytic reactions are not completely understood, such as the mechanics of reaction chemistry and the geometry of catalytic residues within active sites. As a result, the computational prediction of catalytic residues has the potential to identify novel catalytic pockets, aid in the design of more efficient enzymes and also predict the molecular basis of disease. RESULTS: We propose a new kernel-based algorithm for the prediction of catalytic residues based on protein sequence, structure and evolutionary information. The method relies upon explicit modeling of similarity between residue-centered neighborhoods in protein structures. We present evidence that this algorithm evaluates favorably against established approaches, and also provides insights into the relative importance of the geometry, physicochemical properties and evolutionary conservation of catalytic residue activity. The new algorithm was used to identify known mutations associated with inherited disease whose molecular mechanism might be predicted to operate specifically though the loss or gain of catalytic residues. It should, therefore, provide a viable approach to identifying the molecular basis of disease in which the loss or gain of function is not caused solely by the disruption of protein stability. Our analysis suggests that both mechanisms are actively involved in human inherited disease. AVAILABILITY AND IMPLEMENTATION: Source code for the structural kernel is available at www.informatics.indiana.edu/predrag/.


Assuntos
Inteligência Artificial , Domínio Catalítico , Enzimas/química , Enzimas/genética , Doenças Genéticas Inatas/genética , Algoritmos , Sequência de Aminoácidos , Catálise , Biologia Computacional/métodos , Doenças Genéticas Inatas/enzimologia , Humanos , Mutação , Proteínas/química , Proteínas/genética , Software
7.
J Proteome Res ; 9(12): 6288-97, 2010 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-21067214

RESUMO

Peptide detectability is defined as the probability that a peptide is identified in an LC-MS/MS experiment and has been useful in providing solutions to protein inference and label-free quantification. Previously, predictors for peptide detectability trained on standard or complex samples were proposed. Although the models trained on complex samples may benefit from the large training data sets, it is unclear to what extent they are affected by the unequal abundances of identified proteins. To address this challenge and improve detectability prediction, we present a new algorithm for the iterative learning of peptide detectability from complex mixtures. We provide evidence that the new method approximates detectability with useful accuracy and, based on its design, can be used to interpret the outcome of other learning strategies. We studied the properties of peptides from the bacterium Deinococcus radiodurans and found that at standard quantities, its tryptic peptides can be roughly classified as either detectable or undetectable, with a relatively small fraction having medium detectability. We extend the concept of detectability from peptides to proteins and apply the model to predict the behavior of a replicate LC-MS/MS experiment from a single analysis. Finally, our study summarizes a theoretical framework for peptide/protein identification and label-free quantification.


Assuntos
Peptídeos/análise , Proteínas/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Algoritmos , Proteínas de Bactérias/análise , Cromatografia Líquida/métodos , Deinococcus/metabolismo , Redes Neurais de Computação , Proteínas/metabolismo , Reprodutibilidade dos Testes
8.
Anal Chem ; 82(15): 6559-68, 2010 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-20669997

RESUMO

A synthetic approach to model the analytical complexity of biological proteolytic digests has been developed. Combinatorial peptide libraries ranging in length between 9 and 12 amino acids that represent typical tryptic digests were designed, synthesized, and analyzed. Individual libraries and mixtures thereof were studied by replicate liquid chromatography-ion trap mass spectrometry and compared to a tryptic digest of Deinococcus radiodurans. Similar to complex proteome analysis, replicate study of individual libraries identified additional unique peptides. Fewer novel sequences were revealed with each additional analysis in a manner similar to that observed for biological data. Our results demonstrate a bimodal distribution of peptides sorting to either very low or very high levels of detection. Upon mixing of libraries at equal abundance, a length-dependent bias in favor of longer sequence identification was observed. Peptide identification as a function of site-specific amino acid content was characterized with certain amino acids proving to be of considerable importance. This report demonstrates that peptide libraries of defined character can serve as a reference for instrument characterization. Furthermore, they are uniquely suited to delineate the physical properties that influence identification of peptides, which provides a foundation for optimizing the study of samples with less defined heterogeneity.


Assuntos
Peptídeos/química , Proteômica/métodos , Sequência de Aminoácidos , Cromatografia Líquida/métodos , Deinococcus/metabolismo , Espectrometria de Massas/métodos , Biblioteca de Peptídeos , Peptídeos/síntese química , Tripsina/metabolismo
9.
Sci Adv ; 4(4): eaar5459, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29651464

RESUMO

For decades, fungi have been a source of U.S. Food and Drug Administration-approved natural products such as penicillin, cyclosporine, and the statins. Recent breakthroughs in DNA sequencing suggest that millions of fungal species exist on Earth, with each genome encoding pathways capable of generating as many as dozens of natural products. However, the majority of encoded molecules are difficult or impossible to access because the organisms are uncultivable or the genes are transcriptionally silent. To overcome this bottleneck in natural product discovery, we developed the HEx (Heterologous EXpression) synthetic biology platform for rapid, scalable expression of fungal biosynthetic genes and their encoded metabolites in Saccharomyces cerevisiae. We applied this platform to 41 fungal biosynthetic gene clusters from diverse fungal species from around the world, 22 of which produced detectable compounds. These included novel compounds with unexpected biosynthetic origins, particularly from poorly studied species. This result establishes the HEx platform for rapid discovery of natural products from any fungal species, even those that are uncultivable, and opens the door to discovery of the next generation of natural products.


Assuntos
Produtos Biológicos/metabolismo , Fungos/genética , Fungos/metabolismo , Expressão Gênica , Engenharia Genética , Vias Biossintéticas , Fermentação , Engenharia Genética/métodos , Ensaios de Triagem em Larga Escala , Regiões Promotoras Genéticas , Fluxo de Trabalho
10.
Pac Symp Biocomput ; 21: 381-92, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26776202

RESUMO

The causes of complex diseases are multifactorial and the phenotypes of complex diseases are typically heterogeneous, posting significant challenges for both the experiment design and statistical inference in the study of such diseases. Transcriptome profiling can potentially provide key insights on the pathogenesis of diseases, but the signals from the disease causes and consequences are intertwined, leaving it to speculations what are likely causal. Genome-wide association study on the other hand provides direct evidences on the potential genetic causes of diseases, but it does not provide a comprehensive view of disease pathogenesis, and it has difficulties in detecting the weak signals from individual genes. Here we propose an approach diseaseExPatho that combines transcriptome data, regulome knowledge, and GWAS results if available, for separating the causes and consequences in the disease transcriptome. DiseaseExPatho computationally deconvolutes the expression data into gene expression modules, hierarchically ranks the modules based on regulome using a novel algorithm, and given GWAS data, it directly labels the potential causal gene modules based on their correlations with genome-wide gene-disease associations. Strikingly, we observed that the putative causal modules are not necessarily differentially expressed in disease, while the other modules can show strong differential expression without enrichment of top GWAS variations. On the other hand, we showed that the regulatory network based module ranking prioritized the putative causal modules consistently in 6 diseases. We suggest that the approach is applicable to other common and rare complex diseases to prioritize causal pathways with or without genome-wide association studies.


Assuntos
Predisposição Genética para Doença , Transcriptoma , Algoritmos , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Diabetes Mellitus/genética , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Transtornos Mentais/genética , Modelos Genéticos , Herança Multifatorial , Aprendizado de Máquina não Supervisionado
11.
Nat Med ; 22(5): 547-56, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27089514

RESUMO

Doxorubicin is an anthracycline chemotherapy agent effective in treating a wide range of malignancies, but it causes a dose-related cardiotoxicity that can lead to heart failure in a subset of patients. At present, it is not possible to predict which patients will be affected by doxorubicin-induced cardiotoxicity (DIC). Here we demonstrate that patient-specific human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) can recapitulate the predilection to DIC of individual patients at the cellular level. hiPSC-CMs derived from individuals with breast cancer who experienced DIC were consistently more sensitive to doxorubicin toxicity than hiPSC-CMs from patients who did not experience DIC, with decreased cell viability, impaired mitochondrial and metabolic function, impaired calcium handling, decreased antioxidant pathway activity, and increased reactive oxygen species production. Taken together, our data indicate that hiPSC-CMs are a suitable platform to identify and characterize the genetic basis and molecular mechanisms of DIC.


Assuntos
Antibióticos Antineoplásicos/farmacologia , Apoptose/efeitos dos fármacos , Neoplasias da Mama/tratamento farmacológico , Doxorrubicina/farmacologia , Insuficiência Cardíaca/induzido quimicamente , Mitocôndrias Cardíacas/efeitos dos fármacos , Miócitos Cardíacos/efeitos dos fármacos , Estresse Oxidativo/efeitos dos fármacos , Adulto , Idoso , Antibióticos Antineoplásicos/efeitos adversos , Cálcio/metabolismo , Cardiotoxicidade/genética , Sobrevivência Celular/efeitos dos fármacos , Dano ao DNA/efeitos dos fármacos , Suscetibilidade a Doenças , Doxorrubicina/efeitos adversos , Feminino , Citometria de Fluxo , Imunofluorescência , Insuficiência Cardíaca/genética , Humanos , Células-Tronco Pluripotentes Induzidas , Potencial da Membrana Mitocondrial/efeitos dos fármacos , Pessoa de Meia-Idade , Mitocôndrias Cardíacas/metabolismo , Miócitos Cardíacos/metabolismo , Polimorfismo de Nucleotídeo Único , Espécies Reativas de Oxigênio/metabolismo , Reação em Cadeia da Polimerase em Tempo Real , Transcriptoma
12.
JACC Cardiovasc Imaging ; 8(8): 873-84, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26210695

RESUMO

OBJECTIVES: The purpose of this study was to evaluate whether radiation exposure from cardiac computed tomographic angiography (CTA) is associated with deoxyribonucleic acid (DNA) damage and whether damage leads to programmed cell death and activation of genes involved in apoptosis and DNA repair. BACKGROUND: Exposure to radiation from medical imaging has become a public health concern, but whether it causes significant cell damage remains unclear. METHODS: We conducted a prospective cohort study in 67 patients undergoing cardiac CTA between January 2012 and December 2013 in 2 U.S. medical centers. Median blood radiation exposure was estimated using phantom dosimetry. Biomarkers of DNA damage and apoptosis were measured by flow cytometry, whole genome sequencing, and single cell polymerase chain reaction. RESULTS: The median dose length product was 1,535.3 mGy·cm (969.7 to 2,674.0 mGy·cm). The median radiation dose to the blood was 29.8 mSv (18.8 to 48.8 mSv). Median DNA damage increased 3.39% (1.29% to 8.04%, p < 0.0001) and median apoptosis increased 3.1-fold (interquartile range [IQR]: 1.4- to 5.1-fold, p < 0.0001) post-radiation. Whole genome sequencing revealed changes in the expression of 39 transcription factors involved in the regulation of apoptosis, cell cycle, and DNA repair. Genes involved in mediating apoptosis and DNA repair were significantly changed post-radiation, including DDB2 (1.9-fold [IQR: 1.5- to 3.0-fold], p < 0.001), XRCC4 (3.0-fold [IQR: 1.1- to 5.4-fold], p = 0.005), and BAX (1.6-fold [IQR: 0.9- to 2.6-fold], p < 0.001). Exposure to radiation was associated with DNA damage (odds ratio [OR]: 1.8 [1.2 to 2.6], p = 0.003). DNA damage was associated with apoptosis (OR: 1.9 [1.2 to 5.1], p < 0.0001) and gene activation (OR: 2.8 [1.2 to 6.2], p = 0.002). CONCLUSIONS: Patients exposed to >7.5 mSv of radiation from cardiac CTA had evidence of DNA damage, which was associated with programmed cell death and activation of genes involved in apoptosis and DNA repair.


Assuntos
Apoptose , Biomarcadores/análise , Angiografia Coronária/efeitos adversos , Dano ao DNA , Coração/efeitos da radiação , Tomografia Computadorizada por Raios X/efeitos adversos , Idoso , Idoso de 80 Anos ou mais , Anexina A5/análise , Proteínas Mutadas de Ataxia Telangiectasia/análise , Estudos de Coortes , Reparo do DNA , Proteínas de Ligação a DNA/análise , Feminino , Citometria de Fluxo , Histonas/análise , Humanos , Imuno-Histoquímica , Masculino , Pessoa de Meia-Idade , Imagens de Fantasmas , Reação em Cadeia da Polimerase , Estudos Prospectivos , Análise de Sequência de DNA , Análise de Sequência de RNA , Proteína X Associada a bcl-2/análise
13.
Stat Interface ; 5(1): 21-37, 2012 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-24761189

RESUMO

We present a generic Bayesian framework for the peptide and protein identification in proteomics, and provide a unified interpretation for the database searching and the de novo peptide sequencing approaches that are used in peptide identification. We describe several probabilistic graphical models and a variety of prior distributions that can be incorporated into the Bayesian framework to model different types of prior information, such as the known protein sequences, the known protein abundances, the peptide precursor masses, the estimated peptide retention time and the peptide detectabilities. Various applications of the Bayesian framework are discussed theoretically, including its application to the identification of peptides containing mutations and post-translational modifications.

14.
J Comput Biol ; 16(8): 1183-93, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19645593

RESUMO

The protein inference problem represents a major challenge in shotgun proteomics. In this article, we describe a novel Bayesian approach to address this challenge by incorporating the predicted peptide detectabilities as the prior probabilities of peptide identification. We propose a rigorious probabilistic model for protein inference and provide practical algoritmic solutions to this problem. We used a complex synthetic protein mixture to test our method and obtained promising results.


Assuntos
Algoritmos , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Modelos Estatísticos
15.
Evolution ; 62(12): 2984-94, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18752601

RESUMO

Rapid and inexpensive sequencing technologies are making it possible to collect whole genome sequence data on multiple individuals from a population. This type of data can be used to quickly identify genes that control important ecological and evolutionary phenotypes by finding the targets of adaptive natural selection, and we therefore refer to such approaches as "reverse ecology." To quantify the power gained in detecting positive selection using population genomic data, we compare three statistical methods for identifying targets of selection: the McDonald-Kreitman test, the mkprf method, and a likelihood implementation for detecting d(N)/d(S) > 1. Because the first two methods use polymorphism data we expect them to have more power to detect selection. However, when applied to population genomic datasets from human, fly, and yeast, the tests using polymorphism data were actually weaker in two of the three datasets. We explore reasons why the simpler comparative method has identified more genes under selection, and suggest that the different methods may really be detecting different signals from the same sequence data. Finally, we find several statistical anomalies associated with the mkprf method, including an almost linear dependence between the number of positively selected genes identified and the prior distributions used. We conclude that interpreting the results produced by this method should be done with some caution.


Assuntos
Interpretação Estatística de Dados , Genética Populacional , Genômica/métodos , Seleção Genética , Animais , Sequência de Bases , Drosophila/genética , Humanos , Dados de Sequência Molecular , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA