RESUMO
Collagen VI myopathies are genetic disorders caused by mutations in collagen 6 A1, A2 and A3 genes, ranging from the severe Ullrich congenital muscular dystrophy to the milder Bethlem myopathy, which is recapitulated by collagen-VI-null (Col6a1(-/-)) mice. Abnormalities in mitochondria and autophagic pathway have been proposed as pathogenic causes of collagen VI myopathies, but the link between collagen VI defects and these metabolic circuits remains unknown. To unravel the expression profiling perturbation in muscles with collagen VI myopathies, we performed a deep RNA profiling in both Col6a1(-/-)mice and patients with collagen VI pathology. The interactome map identified common pathways suggesting a previously undetected connection between circadian genes and collagen VI pathology. Intriguingly, Bmal1(-/-)(also known as Arntl) mice, a well-characterized model displaying arrhythmic circadian rhythms, showed profound deregulation of the collagen VI pathway and of autophagy-related genes. The involvement of circadian rhythms in collagen VI myopathies is new and links autophagy and mitochondrial abnormalities. It also opens new avenues for therapies of hereditary myopathies to modulate the molecular clock or potential gene-environment interactions that might modify muscle damage pathogenesis.
Assuntos
Fatores de Transcrição ARNTL/genética , Relógios Circadianos/fisiologia , Colágeno Tipo VI/genética , Contratura/genética , Mitocôndrias/fisiologia , Distrofias Musculares/congênito , Mutação/genética , Esclerose/genética , Animais , Autofagia/genética , Perfilação da Expressão Gênica , Humanos , Camundongos , Camundongos Knockout , Análise em Microsséries , Distrofias Musculares/genética , RNA/análiseRESUMO
BACKGROUND: Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR). NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. RESULTS: Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. CONCLUSIONS: The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber.
Assuntos
Genômica , Hevea/genética , Análise de Sequência , Alérgenos/genética , Resistência à Doença/genética , Evolução Molecular , Proteínas F-Box/genética , Genoma de Planta/genética , Haploidia , Hevea/imunologia , Hevea/metabolismo , Látex/metabolismo , Anotação de Sequência Molecular , Filogenia , Reguladores de Crescimento de Plantas/genética , Borracha/metabolismo , Transdução de Sinais/genética , Fatores de Transcrição/genética , Madeira/metabolismoRESUMO
Aerobic methanotrophic bacteria consume methane as it diffuses away from methanogenic zones of soil and sediment. They act as a biofilter to reduce methane emissions to the atmosphere, and they are therefore targets in strategies to combat global climate change. No cultured methanotroph grows optimally below pH 5, but some environments with active methane cycles are very acidic. Here we describe an extremely acidophilic methanotroph that grows optimally at pH 2.0-2.5. Unlike the known methanotrophs, it does not belong to the phylum Proteobacteria but rather to the Verrucomicrobia, a widespread and diverse bacterial phylum that primarily comprises uncultivated species with unknown genotypes. Analysis of its draft genome detected genes encoding particulate methane monooxygenase that were homologous to genes found in methanotrophic proteobacteria. However, known genetic modules for methanol and formaldehyde oxidation were incomplete or missing, suggesting that the bacterium uses some novel methylotrophic pathways. Phylogenetic analysis of its three pmoA genes (encoding a subunit of particulate methane monooxygenase) placed them into a distinct cluster from proteobacterial homologues. This indicates an ancient divergence of Verrucomicrobia and Proteobacteria methanotrophs rather than a recent horizontal gene transfer of methanotrophic ability. The findings show that methanotrophy in the Bacteria is more taxonomically, ecologically and genetically diverse than previously thought, and that previous studies have failed to assess the full diversity of methanotrophs in acidic environments.
Assuntos
Bactérias/classificação , Bactérias/metabolismo , Metano/metabolismo , Ácidos/metabolismo , Bactérias/enzimologia , Bactérias/genética , Sedimentos Geológicos/microbiologia , Concentração de Íons de Hidrogênio , Dados de Sequência Molecular , Oxirredução , Oxirredutases/genética , Oxigênio/metabolismo , Oxigenases/genética , Pressão Parcial , Filogenia , RNA Ribossômico 16S/genética , TemperaturaRESUMO
A major complication in COVID-19 infection consists in the onset of acute respiratory distress fueled by a dysregulation of the host immune network that leads to a run-away cytokine storm. Here, we present an in silico approach that captures the host immune system's complex regulatory dynamics, allowing us to identify and rank candidate drugs and drug pairs that engage with minimal subsets of immune mediators such that their downstream interactions effectively disrupt the signaling cascades driving cytokine storm. Drug-target regulatory interactions are extracted from peer-reviewed literature using automated text-mining for over 5000 compounds associated with COVID-induced cytokine storm and elements of the underlying biology. The targets and mode of action of each compound, as well as combinations of compounds, were scored against their functional alignment with sets of competing model-predicted optimal intervention strategies, as well as the availability of like-acting compounds and known off-target effects. Top-ranking individual compounds identified included a number of known immune suppressors such as calcineurin and mTOR inhibitors as well as compounds less frequently associated for their immune-modulatory effects, including antimicrobials, statins, and cholinergic agonists. Pairwise combinations of drugs targeting distinct biological pathways tended to perform significantly better than single drugs with dexamethasone emerging as a frequent high-ranking companion. While these predicted drug combinations aim to disrupt COVID-induced acute respiratory distress syndrome, the approach itself can be applied more broadly to other diseases and may provide a standard tool for drug discovery initiatives in evaluating alternative targets and repurposing approved drugs.
Assuntos
Tratamento Farmacológico da COVID-19 , Inibidores de Hidroximetilglutaril-CoA Redutases , Calcineurina , Síndrome da Liberação de Citocina/tratamento farmacológico , Dexametasona , Humanos , SARS-CoV-2RESUMO
Some organisms can withstand complete body water loss (losing up to 99% of body water) and stay in ametabolic state for decades until rehydration, which is known as anhydrobiosis. Few multicellular eukaryotes on their adult stage can withstand life without water. We still have an incomplete understanding of the mechanism for metazoan survival of anhydrobiosis. Here we report the 255-Mb genome of Aphelenchus avenae, which can endure relative zero humidity for years. Gene duplications arose genome-wide and contributed to the expansion and diversification of 763 kinases, which represents the second largest metazoan kinome to date. Transcriptome analyses of ametabolic state of A. avenae indicate the elevation of ATP level for global recycling of macromolecules and enhancement of autophagy in the early stage of anhydrobiosis. We catalogue 74 species-specific intrinsically disordered proteins, which may facilitate A. avenae to survive through desiccation stress. Our findings refine a molecular basis evolving for survival in extreme water loss and open the way for discovering new anti-desiccation strategies.
Assuntos
Adaptação Biológica/fisiologia , Dessecação , Proteínas de Helminto/genética , Fosfotransferases/genética , Tylenchida/genética , Água/metabolismo , Animais , Evolução Biológica , Duplicação Gênica/fisiologia , Perfilação da Expressão Gênica , Proteínas de Helminto/metabolismo , Umidade , Fosfotransferases/metabolismo , Tylenchida/enzimologiaRESUMO
Duchenne muscular dystrophy (DMD) is a rare genetic disease due to dystrophin gene mutations which cause progressive weakness and muscle wasting. Circadian rhythm coordinates biological processes with the 24-h cycle and it plays a key role in maintaining muscle functions, both in animal models and in humans. We explored expression profiles of circadian circuit master genes both in Duchenne muscular dystrophy skeletal muscle and in its animal model, the mdx mouse. We designed a customized, mouse-specific Fluidic-Card-TaqMan-based assay (Fluid-CIRC) containing thirty-two genes related to circadian rhythm and muscle regeneration and analyzed gastrocnemius and tibialis anterior muscles from both unexercised and exercised mdx mice. Based on this first analysis, we prioritized the 7 most deregulated genes in mdx mice and tested their expression in skeletal muscle biopsies from 10 Duchenne patients. We found that CSNK1E, SIRT1, and MYOG are upregulated in DMD patient biopsies, consistent with the mdx data. We also demonstrated that their proteins are detectable and measurable in the DMD patients' plasma. We suggest that CSNK1E, SIRT1, and MYOG might represent exploratory circadian biomarkers in DMD.
RESUMO
BACKGROUND: Multiple sclerosis (MS) is a complex disorder thought to result from an interaction between environmental and genetic predisposing factors which have not yet been characterised, although it is known to be associated with the HLA region on 6p21.32. Recently, a picture of chronic cerebrospinal venous insufficiency (CCSVI), consequent to stenosing venous malformation of the main extra-cranial outflow routes (VM), has been described in patients affected with MS, introducing an additional phenotype with possible pathogenic significance. METHODS: In order to explore the presence of copy number variations (CNVs) within the HLA locus, a custom CGH array was designed to cover 7 Mb of the HLA locus region (6,899,999 bp; chr6:29,900,001-36,800,000). Genomic DNA of the 15 patients with CCSVI/VM and MS was hybridised in duplicate. RESULTS: In total, 322 CNVs, of which 225 were extragenic and 97 intragenic, were identified in 15 patients. 234 known polymorphic CNVs were detected, the majority of these being situated in non-coding or extragenic regions. The overall number of CNVs (both extra- and intragenic) showed a robust and significant correlation with the number of stenosing VMs (Spearman: r = 0.6590, p = 0.0104; linear regression analysis r = 0.6577, p = 0.0106). The region we analysed contains 211 known genes. By using pathway analysis focused on angiogenesis and venous development, MS, and immunity, we tentatively highlight several genes as possible susceptibility factor candidates involved in this peculiar phenotype. CONCLUSIONS: The CNVs contained in the HLA locus region in patients with the novel phenotype of CCSVI/VM and MS were mapped in detail, demonstrating a significant correlation between the number of known CNVs found in the HLA region and the number of CCSVI-VMs identified in patients. Pathway analysis revealed common routes of interaction of several of the genes involved in angiogenesis and immunity contained within this region. Despite the small sample size in this pilot study, it does suggest that the number of multiple polymorphic CNVs in the HLA locus deserves further study, owing to their possible involvement in susceptibility to this novel MS/VM plus phenotype, and perhaps even other types of the disease.
Assuntos
Cromossomos Humanos Par 6 , Variação Genética , Antígenos HLA/genética , Antígenos HLA-DR/genética , Esclerose Múltipla Recidivante-Remitente/genética , Esclerose Múltipla/genética , Veias/anormalidades , Mapeamento Cromossômico , Hibridização Genômica Comparativa , Éxons/genética , Genótipo , Cadeias HLA-DRB1 , Humanos , Íntrons/genética , Esclerose Múltipla/imunologia , Esclerose Múltipla/fisiopatologia , Esclerose Múltipla Recidivante-Remitente/imunologia , Fenótipo , Polimorfismo Genético , Índice de Gravidade de DoençaRESUMO
BACKGROUND: Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. RESULTS: We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller. CONCLUSION: Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas/classificação , Genes , Reconhecimento Automatizado de Padrão/métodos , Proteínas/fisiologia , Análise por Conglomerados , Biologia Computacional/normas , Bases de Dados Genéticas/normas , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/normas , Mapeamento de Interação de Proteínas , PubMed , Reprodutibilidade dos Testes , Terminologia como AssuntoRESUMO
Microarray-based characterization of tissues, cellular and disease states, and environmental condition and treatment responses provides genome-wide snapshots containing large amounts of invaluable information. However, the lack of inherent structure within the data and strong noise make extracting and interpreting this information and formulating and prioritizing domain relevant hypotheses difficult tasks. Integration with different types of biological data is required to place the expression measurements into a biologically meaningful context. A few approaches in microarray data interpretation are discussed with the emphasis on the use of molecular network information. Statistical procedures are demonstrated that superimpose expression data onto the transcription regulation network mined from scientific literature and aim at selecting transcription regulators with significant patterns of expression changes downstream. Tests are suggested that take into account network topology and signs of transcription regulation effects. The approaches are illustrated using two different expression datasets, the performance is compared, and biological relevance of the predictions is discussed.
Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Transcrição Gênica/fisiologia , Simulação por ComputadorRESUMO
I describe the approaches for choosing primer parameters and calculating primer properties to build a statistical model for PCR primer design. Statistical modeling allows you to fine-tune the PCR primer design for your standard PCR conditions. It is most appropriate for the large organizations routinely performing PCR on the large scale or for the instruments that utilize PCR. This chapter shows how to use the statistical model to optimize the PCR primer design and to cluster primers for multiplex PCR. These methods have been developed to optimize single-nucleotide polymorphism-identification technology (SNP-IT) reaction for SNP genotyping and implemented in the Autoprimer program (http://www.autoprimer.com). The approaches for combining the individual primer scores into statistical model are described in the next chapter.
Assuntos
Primers do DNA/química , Genoma , Modelos Estatísticos , Reação em Cadeia da Polimerase , Análise de Sequência de DNA , Software , Internet , Modelos Químicos , Valor Preditivo dos TestesRESUMO
This chapter describes the statistical method that can be used to predict the success and failure of a designed primer based on properties of genomic sequence surrounding the primer extension, using user's own existing genotyping database. After scores that measure properties of genomic sequence surrounding primer extension are developed as described in previous chapters, this chapter first shows how to use simple statistics to evaluate the correlation between the score and the likelihood of primer success and failure based on user's own empirical data. All scores that show significant correlations with the primer success are kept for further analysis. Next, logistic regression method is described in detail to estimate the contribution of each primer score to the overall primer success/failure rate when all significant scores are weighted simultaneously to produce the logistic regression model. Statistics that evaluate model fit and model discrimination are provided as well. Last, all significant scores are combined into one measure that can predict overall success/failure rate of a given primer design. The estimated logistic regression score allows prioritization of primers, selection of the best possible primer pair, and combining primers into best clusters for multiplex PCR. Software and hardware requirements and sample SAS programs are also included.
Assuntos
Primers do DNA/química , Genoma , Modelos Logísticos , Modelos Químicos , Reação em Cadeia da Polimerase , Análise de Sequência de DNA , Valor Preditivo dos TestesRESUMO
We demonstrate that protein-protein interaction networks in several eukaryotic organisms contain significantly more self-interacting proteins than expected if such homodimers randomly appeared in the course of the evolution. We also show that on average homodimers have twice as many interaction partners than non-self-interacting proteins. More specifically, the likelihood of a protein to physically interact with itself was found to be proportional to the total number of its binding partners. These properties of dimers are in agreement with a phenomenological model, in which individual proteins differ from each other by the degree of their 'stickiness' or general propensity toward interaction with other proteins including oneself. A duplication of self-interacting proteins creates a pair of paralogous proteins interacting with each other. We show that such pairs occur more frequently than could be explained by pure chance alone. Similar to homodimers, proteins involved in heterodimers with their paralogs on average have twice as many interacting partners than the rest of the network. The likelihood of a pair of paralogous proteins to interact with each other was also shown to decrease with their sequence similarity. This points to the conclusion that most of interactions between paralogs are inherited from ancestral homodimeric proteins, rather than established de novo after duplication. We finally discuss possible implications of our empirical observations from functional and evolutionary standpoints.
Assuntos
Evolução Biológica , Complexos Multiproteicos/metabolismo , Animais , Dimerização , Humanos , Complexos Multiproteicos/química , Ligação Proteica , Técnicas do Sistema de Duplo-HíbridoRESUMO
Jute (Corchorus sp.) is one of the most important sources of natural fibre, covering â¼80% of global bast fibre production1. Only Corchorus olitorius and Corchorus capsularis are commercially cultivated, though there are more than 100 Corchorus species2 in the Malvaceae family. Here we describe high-quality draft genomes of these two species and their comparisons at the functional genomics level to support tailor-designed breeding. The assemblies cover 91.6% and 82.2% of the estimated genome sizes for C. olitorius and C. capsularis, respectively. In total, 37,031 C. olitorius and 30,096 C. capsularis genes are identified, and most of the genes are validated by cDNA and RNA-seq data. Analyses of clustered gene families and gene collinearity show that jute underwent shared whole-genome duplication â¼18.66â million years (Myr) ago prior to speciation. RNA expression analysis from isolated fibre cells reveals the key regulatory and structural genes involved in fibre formation. This work expands our understanding of the molecular basis of fibre formation laying the foundation for the genetic improvement of jute.
Assuntos
Corchorus/genética , Genoma de Planta , Corchorus/metabolismo , Genes de Plantas , Genômica , Filogenia , Melhoramento Vegetal , Especificidade da EspécieRESUMO
BACKGROUND: Scientific literature is a source of the most reliable and comprehensive knowledge about molecular interaction networks. Formalization of this knowledge is necessary for computational analysis and is achieved by automatic fact extraction using various text-mining algorithms. Most of these techniques suffer from high false positive rates and redundancy of the extracted information. The extracted facts form a large network with no pathways defined. RESULTS: We describe the methodology for automatic curation of Biological Association Networks (BANs) derived by a natural language processing technology called Medscan. The curated data is used for automatic pathway reconstruction. The algorithm for the reconstruction of signaling pathways is also described and validated by comparison with manually curated pathways and tissue-specific gene expression profiles. CONCLUSION: Biological Association Networks extracted by MedScan technology contain sufficient information for constructing thousands of mammalian signaling pathways for multiple tissues. The automatically curated MedScan data is adequate for automatic generation of good quality signaling networks. The automatically generated Regulome pathways and manually curated pathways used for their validation are available free in the ResNetCore database from Ariadne Genomics, Inc. 1. The pathways can be viewed and analyzed through the use of a free demo version of PathwayStudio software. The Medscan technology is also available for evaluation using the free demo version of PathwayStudio software.
Assuntos
Bases de Dados Bibliográficas , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Mapeamento de Interação de Proteínas/métodos , Proteínas/classificação , Proteínas/metabolismo , Transdução de Sinais/fisiologia , Armazenamento e Recuperação da Informação/métodos , SoftwareRESUMO
The decrease in the drug approval rate by the FDA and the recent failure of some blockbuster drugs has prompted a re-examination of the focus of the pharmaceutical industry on increasing drug selectivity. As a result, it has been proposed that the most efficient cure is in developing promiscuous drugs and selective drug mixtures. Rational design of drug mixtures has been nearly impossible due to the lack of information about in vivo cell regulation, mechanisms of pathway activation, and interactions between different pathways in vivo. We review the current state of the art for rational design of combination therapy and argue that the current industry-wide development of the infrastructure for pathway analysis provides unprecedented opportunity for the rational design of multicomponent and multifunctional drugs. We propose several ways how to use pathway analysis to rationally combine known drugs for either synergizing their efficacy or suppressing individual side effects.
Assuntos
Desenho Assistido por Computador , Combinação de Medicamentos , Desenho de Fármacos , Redes Reguladoras de Genes , Redes e Vias Metabólicas , Preparações Farmacêuticas/química , Biologia de Sistemas , Tecnologia Farmacêutica/métodos , Animais , Bases de Dados Factuais , Interações Medicamentosas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Modelos Biológicos , Modelos Moleculares , Estrutura Molecular , Software , Relação Estrutura-AtividadeRESUMO
Using an empirical panel of more than 20 000 single base primer extension (SNP-IT) assays we have developed a set of statistical scores for evaluating and rank ordering various parameters of the SNP-IT reaction to facilitate high-throughput assay primer design with improved likelihood of success. Each score predicts either signal magnitude from primer extension or signal noise caused by mispriming of primers and structure of the PCR product. All scores have been shown to correlate with the success/failure rate of the SNP-IT reaction, based on analysis of assay results. A logistic regression analysis was applied to combine all scored parameters into one measure predicting the overall success/failure rate of a given SNP marker. Three training sets for different types of SNP-IT reaction, each containing about 22 000 SNP markers, were used to assign weights to each score and optimize the prediction of the combined measure. c-Statistics of 0.69, 0.77 and 0.72 were achieved for three training sets. This new statistical prediction can be used to improve primer design for the SNP-IT reaction and evaluate the probability of genotyping success for a given SNP based on analysis of the surrounding genomic sequence.
Assuntos
Primers do DNA , Genótipo , Modelos Logísticos , Reação em Cadeia da Polimerase/métodos , Reprodutibilidade dos TestesRESUMO
INTRODUCTION: There is certain degree of frustration and discontent in the area of microarray gene expression data analysis of cancer datasets. It arises from the mathematical problem called 'curse of dimensionality,' which is due to the small number of samples available in training sets, used for calculating transcriptional signatures from the large number of differentially expressed (DE) genes, measured by microarrays. The new generation of causal reasoning algorithms can provide solutions to the curse of dimensionality by transforming microarray data into activity of a small number of cancer hallmark pathways. This new approach can make feature space dimensionality optimal for mathematical signature calculations. AREAS COVERED: The author reviews the reasons behind the current frustration with transcriptional signatures derived from DE genes in cancer. He also provides an overview of the novel methods for signature calculations based on differentially variable genes and expression regulators. Furthermore, the authors provide perspectives on causal reasoning algorithms that use prior knowledge about regulatory events described in scientific literature to identify expression regulators responsible for the differential expression observed in cancer samples. EXPERT OPINION: The author advocates causal reasoning methods to calculate cancer pathway activity signatures. The current challenge for these algorithms is in ensuring quality of the knowledgebase. Indeed, the development of cancer hallmark pathway collections, together with statistical algorithms to transform activity of expression regulators into pathway activity, are necessary for causal reasoning to be used in cancer research.
Assuntos
Perfilação da Expressão Gênica/métodos , Neoplasias/terapia , Transcriptoma , Animais , Humanos , MicroRNAs/genética , Terapia de Alvo Molecular , Neoplasias/genéticaRESUMO
BACKGROUND: SNP genotyping typically incorporates a review step to ensure that the genotype calls for a particular SNP are correct. For high-throughput genotyping, such as that provided by the GenomeLab SNPstream instrument from Beckman Coulter, Inc., the manual review used for low-volume genotyping becomes a major bottleneck. The work reported here describes the application of a neural network to automate the review of results. RESULTS: We describe an approach to reviewing the quality of primer extension 2-color fluorescent reactions by clustering optical signals obtained from multiple samples and a single reaction set-up. The method evaluates the quality of the signal clusters from the genotyping results. We developed 64 scores to measure the geometry and position of the signal clusters. The expected signal distribution was represented by a distribution of a 64-component parametric vector obtained by training the two-layer neural network onto a set of 10,968 manually reviewed 2D plots containing the signal clusters. CONCLUSION: The neural network approach described in this paper may be used with results from the GenomeLab SNPstream instrument for high-throughput SNP genotyping. The overall correlation with manual revision was 0.844. The approach can be applied to a quality review of results from other high-throughput fluorescent-based biochemical assays in a high-throughput mode.
Assuntos
Inteligência Artificial , Automação , Primers do DNA/genética , Primers do DNA/metabolismo , Corantes Fluorescentes/metabolismo , Análise por Conglomerados , Genótipo , Humanos , Reação em Cadeia da Polimerase/métodos , Reação em Cadeia da Polimerase/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único/genética , Valor Preditivo dos Testes , Controle de Qualidade , Reprodutibilidade dos Testes , Software/estatística & dados numéricosRESUMO
OBJECTIVE: The aim of this study was to develop a practical and efficient protein identification system for biomedical corpora. DESIGN: The developed system, called ProtScan, utilizes a carefully constructed dictionary of mammalian proteins in conjunction with a specialized tokenization algorithm to identify and tag protein name occurrences in biomedical texts and also takes advantage of Medline "Name-of-Substance" (NOS) annotation. The dictionaries for ProtScan were constructed in a semi-automatic way from various public-domain sequence databases followed by an intensive expert curation step. MEASUREMENTS: The recall and precision of the system have been determined using 1000 randomly selected and hand-tagged Medline abstracts. RESULTS: The developed system is capable of identifying protein occurrences in Medline abstracts with a 98% precision and 88% recall. It was also found to be capable of processing approximately 300 abstracts per second. Without utilization of NOS annotation, precision and recall were found to be 98.5% and 84%, respectively. CONCLUSION: The developed system appears to be well suited for protein-based Medline indexing and can help to improve biomedical information retrieval. Further approaches to ProtScan's recall improvement also are discussed.