Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
J Exp Bot ; 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38954539

RESUMO

Linear mixed models (LMMs) are a commonly used method for genome-wide association studies (GWAS) that aim to detect associations between genetic markers and phenotypic measurements in a population of individuals while accounting for population structure and cryptic relatedness. In a standard GWAS, hundreds of thousands to millions of statistical tests are performed, requiring control for multiple hypothesis testing. Typically, static corrections that penalize the number of tests performed are used to control for the family-wise error rate, which is the probability of making at least one false positive. However, it has been shown that in practice this threshold is too conservative for normally distributed phenotypes and not stringent enough for non-normally distributed phenotypes. Therefore, permutation-based LMM approaches have recently been proposed to provide a more realistic threshold that takes phenotypic distributions into account. In this work, we will discuss the advantages of permutation-based GWAS approaches, including new simulations and results from a re-analysis of all publicly available Arabidopsis thaliana phenotypes from the AraPheno database.

2.
Brief Bioinform ; 22(1): 178-193, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31848574

RESUMO

Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also have an immediate effect on the results of the analyses. The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization. To simplify and standardize the computational analysis, we provide a set of best-practice workflows for 16S rRNA and metagenomic sequencing data (available at https://github.com/grimmlab/MicrobiomeBestPracticeReview).


Assuntos
Metagenômica/métodos , Microbiota/genética , Guias de Prática Clínica como Assunto , Animais , Código de Barras de DNA Taxonômico/métodos , Código de Barras de DNA Taxonômico/normas , Humanos , Metagenômica/normas , RNA Ribossômico 16S/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas
3.
Bioinformatics ; 38(Suppl_2): ii5-ii12, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124808

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) are an integral tool for studying the architecture of complex genotype and phenotype relationships. Linear mixed models (LMMs) are commonly used to detect associations between genetic markers and a trait of interest, while at the same time allowing to account for population structure and cryptic relatedness. Assumptions of LMMs include a normal distribution of the residuals and that the genetic markers are independent and identically distributed-both assumptions are often violated in real data. Permutation-based methods can help to overcome some of these limitations and provide more realistic thresholds for the discovery of true associations. Still, in practice, they are rarely implemented due to the high computational complexity. RESULTS: We propose permGWAS, an efficient LMM reformulation based on 4D tensors that can provide permutation-based significance thresholds. We show that our method outperforms current state-of-the-art LMMs with respect to runtime and that permutation-based thresholds have lower false discovery rates for skewed phenotypes compared to the commonly used Bonferroni threshold. Furthermore, using permGWAS we re-analyzed more than 500 Arabidopsis thaliana phenotypes with 100 permutations each in less than 8 days on a single GPU. Our re-analyses suggest that applying a permutation-based threshold can improve and refine the interpretation of GWAS results. AVAILABILITY AND IMPLEMENTATION: permGWAS is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Marcadores Genéticos , Estudo de Associação Genômica Ampla/métodos , Genótipo , Modelos Lineares , Fenótipo
4.
Health Care Manag Sci ; 26(4): 785-806, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38015289

RESUMO

Assigning inpatients to hospital beds impacts patient satisfaction and the workload of nurses and doctors. The assignment is subject to unknown inpatient arrivals, in particular for emergency patients. Hospitals, therefore, need to deal with uncertainty on actual bed requirements and potential shortage situations as bed capacities are limited. This paper develops a model and solution approach for solving the patient bed-assignment problem that is based on a machine learning (ML) approach to forecasting emergency patients. First, it contributes by improving the anticipation of emergency patients using ML approaches, incorporating weather data, time and dates, important local and regional events, as well as current and historical occupancy levels. Drawing on real-life data from a large case hospital, we were able to improve forecasting accuracy for emergency inpatient arrivals. We achieved up to 17% better root mean square error (RMSE) when using ML methods compared to a baseline approach relying on averages for historical arrival rates. We further show that the ML methods outperform time series forecasts. Second, we develop a new hyper-heuristic for solving real-life problem instances based on the pilot method and a specialized greedy look-ahead (GLA) heuristic. When applying the hyper-heuristic in test sets we were able to increase the objective function by up to 5.3% in comparison to the benchmark approach in [40]. A benchmark with a Genetic Algorithm shows also the superiority of the hyper-heuristic. Third, the combination of ML for emergency patient admission forecasting with advanced optimization through the hyper-heuristic allowed us to obtain an improvement of up to 3.3% on a real-life problem.


Assuntos
Serviço Hospitalar de Emergência , Hospitalização , Humanos , Hospitais , Admissão do Paciente , Aprendizado de Máquina
5.
Bioinformatics ; 37(1): 57-65, 2021 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-32573681

RESUMO

MOTIVATION: Correlating genetic loci with a disease phenotype is a common approach to improve our understanding of the genetics underlying complex diseases. Standard analyses mostly ignore two aspects, namely genetic heterogeneity and interactions between loci. Genetic heterogeneity, the phenomenon that genetic variants at different loci lead to the same phenotype, promises to increase statistical power by aggregating low-signal variants. Incorporating interactions between loci results in a computational and statistical bottleneck due to the vast amount of candidate interactions. RESULTS: We propose a novel method SiNIMin that addresses these two aspects by finding pairs of interacting genes that are, upon combination, associated with a phenotype of interest under a model of genetic heterogeneity. We guide the interaction search using biological prior knowledge in the form of protein-protein interaction networks. Our method controls type I error and outperforms state-of-the-art methods with respect to statistical power. Additionally, we find novel associations for multiple Arabidopsis thaliana phenotypes, and, with an adapted variant of SiNIMin, for a study of rare variants in migraine patients. AVAILABILITY AND IMPLEMENTATION: Code available at https://github.com/BorgwardtLab/SiNIMin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Heterogeneidade Genética , Mapas de Interação de Proteínas , Loci Gênicos , Humanos , Fenótipo , Software
6.
Nucleic Acids Res ; 48(D1): D1063-D1068, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31642487

RESUMO

Genome-wide association studies (GWAS) are integral for studying genotype-phenotype relationships and gaining a deeper understanding of the genetic architecture underlying trait variation. A plethora of genetic associations between distinct loci and various traits have been successfully discovered and published for the model plant Arabidopsis thaliana. This success and the free availability of full genomes and phenotypic data for more than 1,000 different natural inbred lines led to the development of several data repositories. AraPheno (https://arapheno.1001genomes.org) serves as a central repository of population-scale phenotypes in A. thaliana, while the AraGWAS Catalog (https://aragwas.1001genomes.org) provides a publicly available, manually curated and standardized collection of marker-trait associations for all available phenotypes from AraPheno. In this major update, we introduce the next generation of both platforms, including new data, features and tools. We included novel results on associations between knockout-mutations and all AraPheno traits. Furthermore, AraPheno has been extended to display RNA-Seq data for hundreds of accessions, providing expression information for over 28 000 genes for these accessions. All data, including the imputed genotype matrix used for GWAS, are easily downloadable via the respective databases.


Assuntos
Arabidopsis/genética , Biologia Computacional , Bases de Dados Genéticas , Genoma de Planta , Estudo de Associação Genômica Ampla , Fenótipo , Biologia Computacional/métodos , Técnicas de Inativação de Genes , Estudo de Associação Genômica Ampla/métodos , Genótipo , Mutação , Locos de Características Quantitativas , Característica Quantitativa Herdável , Análise de Sequência de RNA , Navegador
7.
PLoS Genet ; 14(2): e1007155, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29432421

RESUMO

By following the evolution of populations that are initially genetically homogeneous, much can be learned about core biological principles. For example, it allows for detailed studies of the rate of emergence of de novo mutations and their change in frequency due to drift and selection. Unfortunately, in multicellular organisms with generation times of months or years, it is difficult to set up and carry out such experiments over many generations. An alternative is provided by "natural evolution experiments" that started from colonizations or invasions of new habitats by selfing lineages. With limited or missing gene flow from other lineages, new mutations and their effects can be easily detected. North America has been colonized in historic times by the plant Arabidopsis thaliana, and although multiple intercrossing lineages are found today, many of the individuals belong to a single lineage, HPG1. To determine in this lineage the rate of substitutions-the subset of mutations that survived natural selection and drift-, we have sequenced genomes from plants collected between 1863 and 2006. We identified 73 modern and 27 herbarium specimens that belonged to HPG1. Using the estimated substitution rate, we infer that the last common HPG1 ancestor lived in the early 17th century, when it was most likely introduced by chance from Europe. Mutations in coding regions are depleted in frequency compared to those in other portions of the genome, consistent with purifying selection. Nevertheless, a handful of mutations is found at high frequency in present-day populations. We link these to detectable phenotypic variance in traits of known ecological importance, life history and growth, which could reflect their adaptive value. Our work showcases how, by applying genomics methods to a combination of modern and historic samples from colonizing lineages, we can directly study new mutations and their potential evolutionary relevance.


Assuntos
Genoma de Planta , Taxa de Mutação , Mutação/fisiologia , Desenvolvimento Vegetal/genética , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Cruzamentos Genéticos , Evolução Molecular Direcionada , Evolução Molecular , Fluxo Gênico/fisiologia , Espécies Introduzidas , Fenótipo , Filogenia , Plantas Daninhas/genética , Plantas Daninhas/crescimento & desenvolvimento , Seleção Genética , Análise de Sequência de DNA
8.
Plant Cell ; 29(1): 5-19, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27986896

RESUMO

The ever-growing availability of high-quality genotypes for a multitude of species has enabled researchers to explore the underlying genetic architecture of complex phenotypes at an unprecedented level of detail using genome-wide association studies (GWAS). The systematic comparison of results obtained from GWAS of different traits opens up new possibilities, including the analysis of pleiotropic effects. Other advantages that result from the integration of multiple GWAS are the ability to replicate GWAS signals and to increase statistical power to detect such signals through meta-analyses. In order to facilitate the simple comparison of GWAS results, we present easyGWAS, a powerful, species-independent online resource for computing, storing, sharing, annotating, and comparing GWAS. The easyGWAS tool supports multiple species, the uploading of private genotype data and summary statistics of existing GWAS, as well as advanced methods for comparing GWAS results across different experiments and data sets in an interactive and user-friendly interface. easyGWAS is also a public data repository for GWAS data and summary statistics and already includes published data and results from several major GWAS. We demonstrate the potential of easyGWAS with a case study of the model organism Arabidopsis thaliana, using flowering and growth-related traits.


Assuntos
Biologia Computacional/métodos , Genoma de Planta/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Flores/genética , Flores/crescimento & desenvolvimento , Genótipo , Humanos , Fenótipo , Reprodutibilidade dos Testes , Software , Interface Usuário-Computador
9.
Nucleic Acids Res ; 46(D1): D1150-D1156, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29059333

RESUMO

The abundance of high-quality genotype and phenotype data for the model organism Arabidopsis thaliana enables scientists to study the genetic architecture of many complex traits at an unprecedented level of detail using genome-wide association studies (GWAS). GWAS have been a great success in A. thaliana and many SNP-trait associations have been published. With the AraGWAS Catalog (https://aragwas.1001genomes.org) we provide a publicly available, manually curated and standardized GWAS catalog for all publicly available phenotypes from the central A. thaliana phenotype repository, AraPheno. All GWAS have been recomputed on the latest imputed genotype release of the 1001 Genomes Consortium using a standardized GWAS pipeline to ensure comparability between results. The catalog includes currently 167 phenotypes and more than 222 000 SNP-trait associations with P < 10-4, of which 3887 are significantly associated using permutation-based thresholds. The AraGWAS Catalog can be accessed via a modern web-interface and provides various features to easily access, download and visualize the results and summary statistics across GWAS.


Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Interface Usuário-Computador
11.
Proc Natl Acad Sci U S A ; 113(46): E7317-E7326, 2016 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-27803326

RESUMO

The ubiquity of nonparental hybrid phenotypes, such as hybrid vigor and hybrid inferiority, has interested biologists for over a century and is of considerable agricultural importance. Although examples of both phenomena have been subject to intense investigation, no general model for the molecular basis of nonadditive genetic variance has emerged, and prediction of hybrid phenotypes from parental information continues to be a challenge. Here we explore the genetics of hybrid phenotype in 435 Arabidopsis thaliana individuals derived from intercrosses of 30 parents in a half diallel mating scheme. We find that nonadditive genetic effects are a major component of genetic variation in this population and that the genetic basis of hybrid phenotype can be mapped using genome-wide association (GWA) techniques. Significant loci together can explain as much as 20% of phenotypic variation in the surveyed population and include examples that have both classical dominant and overdominant effects. One candidate region inherited dominantly in the half diallel contains the gene for the MADS-box transcription factor AGAMOUS-LIKE 50 (AGL50), which we show directly to alter flowering time in the predicted manner. Our study not only illustrates the promise of GWA approaches to dissect the genetic architecture underpinning hybrid performance but also demonstrates the contribution of classical dominance to genetic variance.


Assuntos
Arabidopsis/genética , Vigor Híbrido/genética , Cruzamentos Genéticos , Variação Genética , Hibridização Genética , Fenótipo
12.
Mol Biol Evol ; 33(9): 2257-72, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27189551

RESUMO

Understanding how new species form requires investigation of evolutionary forces that cause phenotypic and genotypic changes among populations. However, the mechanisms underlying speciation vary and little is known about whether genomes diversify in the same ways in parallel at the incipient scale. We address this using the nematode, Pristionchus pacificus, which resides at an interesting point on the speciation continuum (distinct evolutionary lineages without reproductive isolation), and inhabits heterogeneous environments subject to divergent environmental pressures. Using whole genome re-sequencing of 264 strains, we estimate FST to identify outlier regions of extraordinary differentiation (∼1.725 Mb of the 172.5 Mb genome). We find evidence for shared divergent genomic regions occurring at a higher frequency than expected by chance among populations of the same evolutionary lineage. We use allele frequency spectra to find that, among lineages, 53% of divergent regions are consistent with adaptive selection, whereas 24% and 23% of such regions suggest background selection and restricted gene flow, respectively. In contrast, among populations from the same lineage, similar proportions (34-48%) of divergent regions correspond to adaptive selection and restricted gene flow, whereas 13-22% suggest background selection. Because speciation often involves phenotypic and genomic divergence, we also evaluate phenotypic variation, focusing on pH tolerance, which we find is diverging in a manner corresponding to environmental differences among populations. Taking a genome-wide association approach, we functionally validate a significant genotype-phenotype association for this trait. Our results are consistent with P. pacificus undergoing heterogeneous genotypic and phenotypic diversification related to both evolutionary and environmental processes.


Assuntos
Rabditídios/genética , Animais , Evolução Biológica , Evolução Molecular , Fluxo Gênico , Frequência do Gene , Estudos de Associação Genética , Especiação Genética , Variação Genética , Isolamento Reprodutivo , Seleção Genética , Transcriptoma
13.
Bioinformatics ; 31(12): i240-9, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26072488

RESUMO

MOTIVATION: Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: (i) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or (ii) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals. RESULTS: Here, we present an approach that overcomes both problems: it allows one to automatically find all contiguous sequences of single nucleotide polymorphisms in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thaliana genome-wide association study data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping. CONCLUSIONS: Our novel approach can contribute to the genome-wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes. AVAILABILITY AND IMPLEMENTATION: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html.


Assuntos
Heterogeneidade Genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Arabidopsis/genética , Fenótipo
14.
Hum Mutat ; 36(5): 513-23, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25684150

RESUMO

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.


Assuntos
Biologia Computacional/métodos , Mutação de Sentido Incorreto , Software , Conjuntos de Dados como Assunto , Humanos , Internet , Reprodutibilidade dos Testes , Navegador
15.
Sci Data ; 11(1): 109, 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38263173

RESUMO

Sustainable weed management strategies are critical to feeding the world's population while preserving ecosystems and biodiversity. Therefore, site-specific weed control strategies based on automation are needed to reduce the additional time and effort required for weeding. Machine vision-based methods appear to be a promising approach for weed detection, but require high quality data on the species in a specific agricultural area. Here we present a dataset, the Moving Fields Weed Dataset (MFWD), which captures the growth of 28 weed species commonly found in sorghum and maize fields in Germany. A total of 94,321 images were acquired in a fully automated, high-throughput phenotyping facility to track over 5,000 individual plants at high spatial and temporal resolution. A rich set of manually curated ground truth information is also provided, which can be used not only for plant species classification, object detection and instance segmentation tasks, but also for multiple object tracking.

16.
Bioinform Adv ; 3(1): vbad035, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37066135

RESUMO

Summary: Predicting complex traits from genotypic information is a major challenge in various biological domains. With easyPheno, we present a comprehensive Python framework enabling the rigorous training, comparison and analysis of phenotype predictions for a variety of different models, ranging from common genomic selection approaches over classical machine learning and modern deep learning-based techniques. Our framework is easy-to-use, also for non-programming-experts, and includes an automatic hyperparameter search using state-of-the-art Bayesian optimization. Moreover, easyPheno provides various benefits for bioinformaticians developing new prediction models. easyPheno enables to quickly integrate novel models and functionalities in a reliable framework and to benchmark against various integrated prediction models in a comparable setup. In addition, the framework allows the assessment of newly developed prediction models under pre-defined settings using simulated data. We provide a detailed documentation with various hands-on tutorials and videos explaining the usage of easyPheno to novice users. Availability and implementation: easyPheno is publicly available at https://github.com/grimmlab/easyPheno and can be easily installed as Python package via https://pypi.org/project/easypheno/ or using Docker. A comprehensive documentation including various tutorials complemented with videos can be found at https://easypheno.readthedocs.io/. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

17.
Plant Methods ; 19(1): 87, 2023 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-37608384

RESUMO

BACKGROUND: Efficient and site-specific weed management is a critical step in many agricultural tasks. Image captures from drones and modern machine learning based computer vision methods can be used to assess weed infestation in agricultural fields more efficiently. However, the image quality of the captures can be affected by several factors, including motion blur. Image captures can be blurred because the drone moves during the image capturing process, e.g. due to wind pressure or camera settings. These influences complicate the annotation of training and test samples and can also lead to reduced predictive power in segmentation and classification tasks. RESULTS: In this study, we propose DeBlurWeedSeg, a combined deblurring and segmentation model for weed and crop segmentation in motion blurred images. For this purpose, we first collected a new dataset of matching sharp and naturally blurred image pairs of real sorghum and weed plants from drone images of the same agricultural field. The data was used to train and evaluate the performance of DeBlurWeedSeg on both sharp and blurred images of a hold-out test-set. We show that DeBlurWeedSeg outperforms a standard segmentation model that does not include an integrated deblurring step, with a relative improvement of [Formula: see text] in terms of the Sørensen-Dice coefficient. CONCLUSION: Our combined deblurring and segmentation model DeBlurWeedSeg is able to accurately segment weeds from sorghum and background, in both sharp as well as motion blurred drone captures. This has high practical implications, as lower error rates in weed and crop segmentation could lead to better weed control, e.g. when using robots for mechanical weed removal.

18.
NAR Genom Bioinform ; 5(4): lqad087, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37829176

RESUMO

Protein thermostability is important in many areas of biotechnology, including enzyme engineering and protein-hybrid optoelectronics. Ever-growing protein databases and information on stability at different temperatures allow the training of machine learning models to predict whether proteins are thermophilic. In silico predictions could reduce costs and accelerate the development process by guiding researchers to more promising candidates. Existing models for predicting protein thermophilicity rely mainly on features derived from physicochemical properties. Recently, modern protein language models that directly use sequence information have demonstrated superior performance in several tasks. In this study, we evaluate the usefulness of protein language model embeddings for thermophilicity prediction with ProLaTherm, a Protein Language model-based Thermophilicity predictor. ProLaTherm significantly outperforms all feature-, sequence- and literature-based comparison partners on multiple evaluation metrics. In terms of the Matthew's correlation coefficient, ProLaTherm outperforms the second-best competitor by 18.1% in a nested cross-validation setup. Using proteins from species not overlapping with species from the training data, ProLaTherm outperforms all competitors by at least 9.7%. On these data, it misclassified only one nonthermophilic protein as thermophilic. Furthermore, it correctly identified 97.4% of all thermophilic proteins in our test set with an optimal growth temperature above 70°C.

19.
Comput Struct Biotechnol J ; 20: 2699-2712, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35685359

RESUMO

Physically interacting proteins form macromolecule complexes that drive diverse cellular processes. Advances in experimental techniques that capture interactions between proteins provide us with protein-protein interaction (PPI) networks from several model organisms. These datasets have enabled the prediction and other computational analyses of protein complexes. Here we provide a systematic review of the state-of-the-art algorithms for protein complex prediction from PPI networks proposed in the past two decades. The existing approaches that solve this problem are categorized into three groups, including: cluster-quality-based, node affinity-based, and network embedding-based approaches, and we compare and contrast the advantages and disadvantages. We further include a comparative analysis by computing the performance of eighteen methods based on twelve well-established performance measures on four widely used benchmark protein-protein interaction networks. Finally, the limitations and drawbacks of both, current data and approaches, along with the potential solutions in this field are discussed, with emphasis on the points that pave the way for future research efforts in this field.

20.
NAR Genom Bioinform ; 4(3): lqac074, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36186922

RESUMO

Transcriptional-translational coupling is accepted to be a fundamental mechanism of gene expression in prokaryotes and therefore has been analyzed in detail. However, the underlying genomic architecture of the expression machinery has not been well investigated so far. In this study, we established a bioinformatics pipeline to systematically investigated >1800 bacterial genomes for the abundance of transcriptional and translational associated genes clustered in distinct gene cassettes. We identified three highly frequent cassettes containing transcriptional and translational genes, i.e. rplk-nusG (gene cassette 1; in 553 genomes), rpoA-rplQ-rpsD-rpsK-rpsM (gene cassette 2; in 656 genomes) and nusA-infB (gene cassette 3; in 877 genomes). Interestingly, each of the three cassettes harbors a gene (nusG, rpsD and nusA) encoding a protein which links transcription and translation in bacteria. The analyses suggest an enrichment of these cassettes in pathogenic bacterial phyla with >70% for cassette 3 (i.e. Neisseria, Salmonella and Escherichia) and >50% for cassette 1 (i.e. Treponema, Prevotella, Leptospira and Fusobacterium) and cassette 2 (i.e. Helicobacter, Campylobacter, Treponema and Prevotella). These insights form the basis to analyze the transcriptional regulatory mechanisms orchestrating transcriptional-translational coupling and might open novel avenues for future biotechnological approaches.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa