RESUMO
Neanderthal and Denisovan hybridisation with modern humans has generated a non-random genomic distribution of introgressed regions, the result of drift and selection dynamics. Cross-species genomic incompatibility and more efficient removal of slightly deleterious archaic variants have been proposed as selection-based processes involved in the post-hybridisation purge of archaic introgressed regions. Both scenarios require the presence of functionally different alleles across Homo species onto which selection operated differently according to which populations hosted them, but only a few of these variants have been pinpointed so far. In order to identify functionally divergent archaic variants removed in humans, we focused on mitonuclear genes, which are underrepresented in the genomic landscape of archaic humans. We searched for non-synonymous, fixed, archaic-derived variants present in mitonuclear genes, rare or absent in human populations. We then compared the functional impact of archaic and human variants in the model organism Saccharomyces cerevisiae. Notably, a variant within the mitochondrial tyrosyl-tRNA synthetase 2 (YARS2) gene exhibited a significant decrease in respiratory activity and a substantial reduction of Cox2 levels, a proxy for mitochondrial protein biosynthesis, coupled with the accumulation of the YARS2 protein precursor and a lower amount of mature enzyme. Our work suggests that this variant is associated with mitochondrial functionality impairment, thus contributing to the purging of archaic introgression in YARS2. While different molecular mechanisms may have impacted other mitonuclear genes, our approach can be extended to the functional screening of mitonuclear genetic variants present across species and populations.
Assuntos
Homem de Neandertal , Saccharomyces cerevisiae , Humanos , Saccharomyces cerevisiae/genética , Homem de Neandertal/genética , Animais , Variação Genética , Mitocôndrias/genética , Mitocôndrias/metabolismo , Alelos , Introgressão Genética , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMO
MOTIVATION: Mutational signatures are a critical component in deciphering the genetic alterations that underlie cancer development and have become a valuable resource to understand the genomic changes during tumorigenesis. Therefore, it is essential to employ precise and accurate methods for their extraction to ensure that the underlying patterns are reliably identified and can be effectively utilized in new strategies for diagnosis, prognosis, and treatment of cancer patients. RESULTS: We present MUSE-XAE, a novel method for mutational signature extraction from cancer genomes using an explainable autoencoder. Our approach employs a hybrid architecture consisting of a nonlinear encoder that can capture nonlinear interactions among features, and a linear decoder which ensures the interpretability of the active signatures. We evaluated and compared MUSE-XAE with other available tools on both synthetic and real cancer datasets and demonstrated that it achieves superior performance in terms of precision and sensitivity in recovering mutational signature profiles. MUSE-XAE extracts highly discriminative mutational signature profiles by enhancing the classification of primary tumour types and subtypes in real world settings. This approach could facilitate further research in this area, with neural networks playing a critical role in advancing our understanding of cancer genomics. AVAILABILITY AND IMPLEMENTATION: MUSE-XAE software is freely available at https://github.com/compbiomed-unito/MUSE-XAE.
Assuntos
Mutação , Neoplasias , Humanos , Neoplasias/genética , Algoritmos , Software , Genômica/métodos , Biologia Computacional/métodos , Redes Neurais de ComputaçãoRESUMO
Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and 'all' available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21-0.5 and 0-0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51-0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the $\Delta \Delta G$ predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparisons of prediction methods using an entirely novel set of variants never tested before.
Assuntos
Mutação Puntual , Proteínas , Mutação , Estabilidade Proteica , Proteínas/química , TermodinâmicaRESUMO
Micro-RNAs (miRNAs) are involved in the modulation of viral replication and host immune antiviral response. Using next-generation sequencing, we investigated the miRNome profile of circulating extracellular vesicles in 20 patients with chronic hepatitis D virus (HDV) infection undergoing pegylated interferon alpha (Peg-IFNα) treatment. Circulating miRNAs' expression was analysed according to virologic response (i.e., HDV RNA clearance maintained at least 6 months after the end of therapy). Overall, 8 patients (40%) achieved a virologic response to Peg-IFNα treatment. At baseline, 14 miRNAs were differentially expressed between responders and non-responders; after 6 months of Peg-IFNα treatment, 7 miRNAs (miR-155-5p, miR-1246, miR-423-3p, miR-760, miR-744-5p, miR-1307-3p and miR-146a-5p) were consistently de-regulated. Among de-regulated miRNAs, miR-155-5p showed an inverse correlation with HDV RNA (at baseline: rs = -0.39, p = 0.092; at 6 months: rs = -0.53, p = 0.016) and hepatitis B surface antigen (HBsAg) (at baseline: rs = -0.49, p = 0.028; at 6 months: rs-0.71, p < 0.001). At logistic regression analysis, both miR-155-5p (at baseline: OR = 4.52, p = 0.022; at 6 months: OR = 5.30, p = 0.029) and HDV RNA (at baseline: OR = 0.19, p = 0.022; at 6 months: OR = 0.38, p = 0.018) resulted significantly associated to virologic response. Considering that Peg-IFNα still has a relevant role in the treatment of patients with chronic hepatitis D infection, the assessment of EV miR-155-5p may represent an additional valuable tool for the management of HDV patients undergoing Peg-IFNα treatment.
Assuntos
Antivirais , Vesículas Extracelulares , Hepatite D Crônica , Interferon-alfa , MicroRNAs , Humanos , Masculino , Interferon-alfa/uso terapêutico , Vesículas Extracelulares/metabolismo , Feminino , Hepatite D Crônica/tratamento farmacológico , Adulto , Antivirais/uso terapêutico , Pessoa de Meia-Idade , MicroRNAs/sangue , MicroRNAs/genética , Resultado do Tratamento , Vírus Delta da Hepatite/genética , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
Genetic markers (especially short tandem repeats or STRs) located on the X chromosome are a valuable resource to solve complex kinship cases in forensic genetics in addition or alternatively to autosomal STRs. Groups of tightly linked markers are combined into haplotypes, thus increasing the discriminating power of tests. However, this approach requires precise knowledge of the recombination rates between adjacent markers. The International Society of Forensic Genetics recommends that recombination rate estimation on the X chromosome is performed from pedigree genetic data while taking into account the confounding effect of mutations. However, implementations that satisfy these requirements have several drawbacks: they were never publicly released, they are very slow and/or need cluster-level hardware and strong computational expertise to use. In order to address these key concerns we developed Recombulator-X, a new open-source Python tool. The most challenging issue, namely the running time, was addressed with dynamic programming techniques to greatly reduce the computational complexity of the algorithm. Compared to the previous methods, Recombulator-X reduces the estimation times from weeks or months to less than one hour for typical datasets. Moreover, the estimation process, including preprocessing, has been streamlined and packaged into a simple command-line tool that can be run on a normal PC. Where previous approaches were limited to small panels of STR markers (up to 15), our tool can handle greater numbers (up to 100) of mixed STR and non-STR markers. In conclusion, Recombulator-X makes the estimation process much simpler, faster and accessible to researchers without a computational background, hopefully spurring increased adoption of best practices.
RESUMO
Estimating the functional effect of single amino acid variants in proteins is fundamental for predicting the change in the thermodynamic stability, measured as the difference in the Gibbs free energy of unfolding, between the wild-type and the variant protein (ΔΔG). Here, we present the web-server of the DDGun method, which was previously developed for the ΔΔG prediction upon amino acid variants. DDGun is an untrained method based on basic features derived from evolutionary information. It is antisymmetric, as it predicts opposite ΔΔG values for direct (A â B) and reverse (B â A) single and multiple site variants. DDGun is available in two versions, one based on only sequence information and the other one based on sequence and structure information. Despite being untrained, DDGun reaches prediction performances comparable to those of trained methods. Here we make DDGun available as a web server. For the web server version, we updated the protein sequence database used for the computation of the evolutionary features, and we compiled two new data sets of protein variants to do a blind test of its performances. On these blind data sets of single and multiple site variants, DDGun confirms its prediction performance, reaching an average correlation coefficient between experimental and predicted ΔΔG of 0.45 and 0.49 for the sequence-based and structure-based versions, respectively. Besides being used for the prediction of ΔΔG, we suggest that DDGun should be adopted as a benchmark method to assess the predictive capabilities of newly developed methods. Releasing DDGun as a web-server, stand-alone program and docker image will facilitate the necessary process of method comparison to improve ΔΔG prediction.
Assuntos
Aminoácidos , Estabilidade Proteica , Proteínas , Aminoácidos/genética , Computadores , Bases de Dados de Proteínas , Proteínas/genética , Proteínas/químicaRESUMO
BACKGROUND AND AIMS: Nonalcoholic fatty liver disease (NAFLD) is a complex disease, resulting from the interplay between environmental determinants and genetic variations. Single nucleotide polymorphism rs738409 C>G in the PNPLA3 gene is associated with hepatic fibrosis and with higher risk of developing hepatocellular carcinoma. Here, we analyzed a longitudinal cohort of biopsy-proven NAFLD subjects with the aim to identify individuals in whom genetics may have a stronger impact on disease progression. METHODS: We retrospectively analyzed 756 consecutive, prospectively enrolled biopsy-proven NAFLD subjects from Italy, United Kingdom, and Spain who were followed for a median of 84 months (interquartile range, 65-109 months). We stratified the study cohort according to sex, body mass index (BMI) ≥30 kg/m2) and age (≥50 years). Liver-related events (hepatic decompensation, hepatic encephalopathy, esophageal variceal bleeding, and hepatocellular carcinoma) were recorded during the follow-up and the log-rank test was used to compare groups. RESULTS: Overall, the median age was 48 years and most individuals were men (64.7%). The PNPLA3 rs738409 genotype was CC in 235 (31.1%), CG in 328 (43.4%), and GG in 193 (25.5%) patients. At univariate analysis, the PNPLA3 GG risk genotype was associated with female sex and inversely related to BMI (odds ratio, 1.6; 95% confidence interval, 1.1-2.2; P = .006; and odds ratio, 0.97; 95% confidence interval, 0.94-0.99; P = .043, respectively). Specifically, PNPLA3 GG risk homozygosis was more prevalent in female vs male individuals (31.5% vs 22.3%; P = .006) and in nonobese compared with obese NAFLD subjects (50.0% vs 44.2%; P = .011). Following stratification for age, sex, and BMI, we observed an increased incidence of liver-related events in the subgroup of nonobese women older than 50 years of age carrying the PNPLA3 GG risk genotype (log-rank test, P = .0047). CONCLUSIONS: Nonobese female patients with NAFLD 50 years of age and older, and carrying the PNPLA3 GG risk genotype, are at higher risk of developing liver-related events compared with those with the wild-type allele (CC/CG). This finding may have implications in clinical practice for risk stratification and personalized medicine.
Assuntos
Carcinoma Hepatocelular , Varizes Esofágicas e Gástricas , Neoplasias Hepáticas , Hepatopatia Gordurosa não Alcoólica , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/complicações , Hepatopatia Gordurosa não Alcoólica/genética , Hepatopatia Gordurosa não Alcoólica/epidemiologia , Carcinoma Hepatocelular/epidemiologia , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/complicações , Estudos Retrospectivos , Varizes Esofágicas e Gástricas/complicações , Hemorragia Gastrointestinal/complicações , Genótipo , Polimorfismo de Nucleotídeo Único , Neoplasias Hepáticas/epidemiologia , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/complicações , Predisposição Genética para DoençaRESUMO
MOTIVATION: Many genomics applications require the computation of nucleotide coverage of a reference genome or the ability to determine how many reads map to a reference region. RESULTS: BamToCov is a toolkit for rapid and flexible coverage computation that relies on the most memory efficient algorithm and is designed for integration in pipelines, given its ability to read alignment files from streams. The tools in the suite can process sorted BAM or CRAM files, allowing the user to extract coverage information via different filtering approaches and to save the output in different formats (BED, Wig or counts). The BamToCov algorithm can also handle strand-specific and/or physical coverage analyses. AVAILABILITY AND IMPLEMENTATION: This program, accessory utilities and their documentation are freely available at https://github.com/telatin/BamToCov. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genômica , Software , Análise de Sequência de DNA , Algoritmos , Documentação , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
OBJECTIVE: The full phenotypic expression of non-alcoholic fatty liver disease (NAFLD) in lean subjects is incompletely characterised. We aimed to investigate prevalence, characteristics and long-term prognosis of Caucasian lean subjects with NAFLD. DESIGN: The study cohort comprises 1339 biopsy-proven NAFLD subjects from four countries (Italy, UK, Spain and Australia), stratified into lean and non-lean (body mass index (BMI) ≥25 kg/m2). Liver/non-liver-related events and survival free of transplantation were recorded during the follow-up, compared by log-rank testing and reported by adjusted HR. RESULTS: Lean patients represented 14.4% of the cohort and were predominantly of Italian origin (89%). They had less severe histological disease (lean vs non-lean: non-alcoholic steatohepatitis 54.1% vs 71.2% p<0.001; advanced fibrosis 10.1% vs 25.2% p<0.001), lower prevalence of diabetes (9.2% vs 31.4%, p<0.001), but no significant differences in the prevalence of the PNPLA3 I148M variant (p=0.57). During a median follow-up of 94 months (>10 483 person-years), 4.7% of lean vs 7.7% of non-lean patients reported liver-related events (p=0.37). No difference in survival was observed compared with non-lean NAFLD (p=0.069). CONCLUSIONS: Caucasian lean subjects with NAFLD may progress to advanced liver disease, develop metabolic comorbidities and experience cardiovascular disease (CVD) as well as liver-related mortality, independent of longitudinal progression to obesity and PNPLA3 genotype. These patients represent one end of a wide spectrum of phenotypic expression of NAFLD where the disease manifests at lower overall BMI thresholds. LAY SUMMARY: NAFLD may affect and progress in both obese and lean individuals. Lean subjects are predominantly males, have a younger age at diagnosis and are more prevalent in some geographic areas. During the follow-up, lean subjects can develop hepatic and extrahepatic disease, including metabolic comorbidities, in the absence of weight gain. These patients represent one end of a wide spectrum of phenotypic expression of NAFLD.
Assuntos
Hepatopatia Gordurosa não Alcoólica/complicações , Magreza/complicações , População Branca , Adulto , Índice de Massa Corporal , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Hepatopatia Gordurosa não Alcoólica/mortalidade , Hepatopatia Gordurosa não Alcoólica/patologia , Prognóstico , Taxa de Sobrevida , Magreza/mortalidade , Magreza/patologiaRESUMO
BACKGROUND: Bladder cancer (BC) has the highest per-patient cost of all cancer types. Hence, we aim to develop a non-invasive, point-of-care tool for the diagnostic and molecular stratification of patients with BC based on combined microRNAs (miRNAs) and surface-enhanced Raman spectroscopy (SERS) profiling of urine. METHODS: Next-generation sequencing of the whole miRNome and SERS profiling were performed on urine samples collected from 15 patients with BC and 16 control subjects (CTRLs). A retrospective cohort (BC = 66 and CTRL = 50) and RT-qPCR were used to confirm the selected differently expressed miRNAs. Diagnostic accuracy was assessed using machine learning algorithms (logistic regression, naïve Bayes, and random forest), which were trained to discriminate between BC and CTRL, using as input either miRNAs, SERS, or both. The molecular stratification of BC based on miRNA and SERS profiling was performed to discriminate between high-grade and low-grade tumors and between luminal and basal types. RESULTS: Combining SERS data with three differentially expressed miRNAs (miR-34a-5p, miR-205-3p, miR-210-3p) yielded an Area Under the Curve (AUC) of 0.92 ± 0.06 in discriminating between BC and CTRL, an accuracy which was superior either to miRNAs (AUC = 0.84 ± 0.03) or SERS data (AUC = 0.84 ± 0.05) individually. When evaluating the classification accuracy for luminal and basal BC, the combination of miRNAs and SERS profiling averaged an AUC of 0.95 ± 0.03 across the three machine learning algorithms, again better than miRNA (AUC = 0.89 ± 0.04) or SERS (AUC = 0.92 ± 0.05) individually, although SERS alone performed better in terms of classification accuracy. CONCLUSION: miRNA profiling synergizes with SERS profiling for point-of-care diagnostic and molecular stratification of BC. By combining the two liquid biopsy methods, a clinically relevant tool that can aid BC patients is envisaged.
Assuntos
MicroRNAs , Neoplasias da Bexiga Urinária , Teorema de Bayes , Biomarcadores Tumorais/genética , Humanos , Biópsia Líquida , MicroRNAs/genética , Sistemas Automatizados de Assistência Junto ao Leito , Estudos Retrospectivos , Neoplasias da Bexiga Urinária/diagnóstico , Neoplasias da Bexiga Urinária/genéticaRESUMO
To reconstruct the phenotypical and clinical implications of the Italian genetic structure, we thoroughly analyzed a whole-exome sequencing data set comprised of 1686 healthy Italian individuals. We found six previously unreported variants with remarkable frequency differences between Northern and Southern Italy in the HERC2, OR52R1, ADH1B, and THBS4 genes. We reported 36 clinically relevant variants (submitted as pathogenic, risk factors, or drug response in ClinVar) with significant frequency differences between Italy and Europe. We then explored putatively pathogenic variants in the Italian exome. On average, our Italian individuals carried 16.6 protein-truncating variants (PTVs), with 2.5% of the population having a PTV in one of the 59 American College of Medical Genetics (ACMG) actionable genes. Lastly, we looked for PTVs that are likely to cause Mendelian diseases. We found four heterozygous PTVs in haploinsufficient genes (KAT6A, PTCH1, and STXBP1) and three homozygous PTVs in genes causing recessive diseases (DPYD, FLG, and PYGM). Comparing frequencies from our data set to other public databases, like gnomAD, we showed the importance of population-specific databases for a more accurate assessment of variant pathogenicity. For this reason, we made aggregated frequencies from our data set publicly available as a tool for both clinicians and researchers (http://nigdb.cineca.it; NIG-ExIT).
Assuntos
Exoma , Variação Genética , Europa (Continente) , Exoma/genética , Humanos , Itália , Sequenciamento do ExomaRESUMO
The taxonomic composition of microbial communities can be assessed using universal marker amplicon sequencing. The most common taxonomic markers are the 16S rDNA for bacterial communities and the internal transcribed spacer (ITS) region for fungal communities, but various other markers are used for barcoding eukaryotes. A crucial step in the bioinformatic analysis of amplicon sequences is the identification of representative sequences. This can be achieved using a clustering approach or by denoising raw sequencing reads. DADA2 is a widely adopted algorithm, released as an R library, that denoises marker-specific amplicons from next-generation sequencing and produces a set of representative sequences referred to as 'Amplicon Sequence Variants' (ASV). Here, we present Dadaist2, a modular pipeline, providing a complete suite for the analysis that ranges from raw sequencing reads to the statistics of numerical ecology. Dadaist2 implements a new approach that is specifically optimised for amplicons with variable lengths, such as the fungal ITS. The pipeline focuses on streamlining the data flow from the command line to R, with multiple options for statistical analysis and plotting, both interactive and automatic.
Assuntos
Código de Barras de DNA Taxonômico/estatística & dados numéricos , Metagenômica/estatística & dados numéricos , Microbiota/genética , Software , Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Interpretação Estatística de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Metadados , RNA Ribossômico 16S/genética , Análise de Sequência de DNARESUMO
The KMT2A/AFF1 rearrangement is associated with an unfavorable prognosis in infant acute lymphocytic leukemia (ALL). Discordant ALL in monozygotic twins is uncommon and represents an attractive resource to evaluate intrauterine environment-genetic interplay in ALL. Mutational and epigenetic profiles were characterized for a discordant KMT2A/AFF1-rearranged infant monozygotic twin pair and their parents, and they were compared to three independent KMT2A/AFF1-positive ALL infants, in which the DNA methylation and gene expression profiles were investigated. A de novo Q61H NRAS mutation was detected in the affected twin at diagnosis and backtracked in both twins at birth. The KMT2A/AFF1 rearrangement was absent at birth in both twins. Genetic analyses conducted at birth gave more insights into the timing of the mutation hit. We identified correlations between DNA methylation and gene expression changes for 32 genes in the three independent affected versus remitted patients. The strongest correlations were observed for the RAB32, PDK4, CXCL3, RANBP17, and MACROD2 genes. This epigenetic signature could be a putative target for the development of novel epigenetic-based therapies and could help in explaining the molecular mechanisms characterizing ALL infants with KMT2A/AFF1 fusions.
Assuntos
Proteínas de Ligação a DNA/genética , Epigênese Genética , Regulação da Expressão Gênica , Rearranjo Gênico , Histona-Lisina N-Metiltransferase/genética , Proteína de Leucina Linfoide-Mieloide/genética , Fatores de Elongação da Transcrição/genética , Gêmeos Monozigóticos/genética , Alelos , Biologia Computacional/métodos , Ilhas de CpG , Metilação de DNA , Epigenômica/métodos , Feminino , Genótipo , Humanos , Recém-Nascido , Masculino , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Sequenciamento do ExomaRESUMO
BACKGROUND: Whole genome and exome sequencing are contributing to the extraordinary progress in the study of human genetic variants. In this fast developing field, appropriate and easily accessible tools are required to facilitate data analysis. RESULTS: Here we describe QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. Instead of being designed on specific datasets, it works on a general XML schema specifying formats and criteria of each data source. Thanks to this flexibility, new criteria can be easily added for future expansion. Currently, up to 70 user-selectable criteria are available, including a wide range of gene and variant features. Moreover, rather than progressively discarding variants taking one criterion at a time, the prioritization is achieved by a global positive selection process that considers all transcript isoforms, thus producing reliable results. QueryOR is easy to use and its intuitive interface allows to handle different kinds of inheritance as well as features related to sharing variants in different patients. QueryOR is suitable for investigating single patients, families or cohorts. CONCLUSIONS: QueryOR is a comprehensive and flexible web platform eligible for an easy user-driven variant prioritization. It is freely available for academic institutions at http://queryor.cribi.unipd.it/ .
Assuntos
Bases de Dados Genéticas , Variação Genética , Software , Doença/genética , Exoma , Genoma Humano , Humanos , InternetRESUMO
Data sharing among different institutions represents one of the major challenges in developing distributed machine learning approaches, especially when data is sensitive, such as in medical applications. Federated learning is a possible solution, but requires fast communications and flawless security. Here, we propose SYNDSURV (SYNthetic Distributed SURVival), an alternative approach that simplifies the current state-of-the-art paradigm by allowing different centres to generate local simulated instances from real data and then gather them into a centralised hub, where an Artificial Intelligence (AI) model can learn in a standard way. The main advantage of this procedure is that it is model-agnostic, therefore prediction models can be directly applied in distributed applications without requiring particular adaptations as the current federated approaches do. To show the validity of our approach for medical applications, we tested it on a survival analysis task, offering a viable alternative to train AI models on distributed data. While federated learning has been mainly optimised for gradient-based approaches so far, our framework works with any predictive method, proving to be a comparable way of performing distributed learning without being too demanding towards each participating institute in terms of infrastructural requirements.
Assuntos
Inteligência Artificial , Aprendizado de Máquina , Análise de SobrevidaRESUMO
Missense variation in genomes can affect protein structure stability and, in turn, the cell physiology behavior. Predicting the impact of those variations is relevant, and the best-performing computational tools exploit the protein structure information. However, most of the current protein sequence variants are unresolved, and comparative or ab initio tools can provide a structure. Here, we evaluate the impact of model structures, compared to experimental structures, on the predictors of protein stability changes upon single-point mutations, where no significant changes are expected between the original and the mutated structures. We show that there are substantial differences among the computational tools. Methods that rely on coarse-grained representation are less sensitive to the underlying protein structures. In contrast, tools that exploit more detailed molecular representations are sensible to structures generated from comparative modeling, even on single-residue substitutions.
Assuntos
Biologia Computacional , Mutação Puntual , Biologia Computacional/métodos , Proteínas/metabolismo , Estabilidade Proteica , Sequência de AminoácidosRESUMO
BACKGROUND: Small non-coding (snc)RNAs, including microRNAs and P-element induced wimpy testis (PIWI)-interacting-RNAs (piRNAs), crucially regulate gene expression in both physiological and pathological conditions. In particular, some muscle-specific microRNAs (myomiRs) have been involved in the pathogenesis of cancer-induced muscle wasting. The aims of the present study were (i) to profile sncRNAs in both skeletal muscle and plasma of gastrointestinal cancer patients and (ii) to investigate the association among differentially expressed sncRNAs and the level of muscularity at body composition analysis. METHODS: Surgical patients with gastrointestinal cancer or benign disease were recruited. Blood samples and muscle biopsies (rectus abdominis) were collected during surgery. Low muscularity patients were those at the lowest tertile of skeletal muscle index (SMI; CT-scan), whereas moderate/high muscularity patients were in the middle and highest SMI tertiles. SncRNAs in the muscle were assessed by RNAseq, circulating microRNAs were evaluated by qPCR. RESULTS: Cancer patients (n = 25; 13 females, 52%) showed a mean age of 71.6 ± 11.2 years, a median body weight loss of 4.2% and a mean BMI of 27.0 ± 3.2 kg/m2 . Control group (n = 15; 9 females, 60%) showed a mean age 58.1 ± 13.9 years and a mean BMI of 28.0 ± 4.3 kg/m2 . In cancer patients, the median L3-SMI (cm2 /m2 ) was 42.52 (34.42; 49.07). Males showed a median L3-SMI of 46.08 (41.17-51.79) and females a median L3-SMI of 40.77 (33.73-42.87). Moderate-high and low muscularity groups included 17 and 8 patients, respectively. As for circulating microRNAs, miR-21-5p and miR-133a-3p were up-regulated in patients compared with controls, whereas miR-15b-5p resulted down-regulated in the same comparison (about 30% of control values). Sample clustering by muscularity and sex revealed increased miR-133a-3p and miR-206 only in moderate-high muscularity males. SncRNA profiling in the muscle identified 373 microRNAs and 190 piRNAs (72.5% and 18.7% of raw reads, respectively). As for microRNAs, 10 were up-regulated, and 56 were down-regulated in cancer patients versus controls. Among the 24 dysregulated piRNAs, the majority were down-regulated, including the top two most expressed piRNAs in the muscle (piR-12790 and piR-2106). Network analysis on validated mRNA targets of down-regulated microRNAs revealed miR-15b-5p, miR-106a-5p and miR-106b-5p as main interactors of genes related to ubiquitin ligase/transferase activities. CONCLUSIONS: These results show dysregulation of both muscle microRNAs and piRNAs in cancer patients compared with controls, the former following a sex-specific pattern. Changes in circulating microRNAs are associated with the degree of muscularity rather than body weight loss.
Assuntos
MicroRNA Circulante , Neoplasias Gastrointestinais , MicroRNAs , Pequeno RNA não Traduzido , Masculino , Feminino , Humanos , Pessoa de Meia-Idade , Idoso , Idoso de 80 Anos ou mais , Adulto , Pequeno RNA não Traduzido/genética , RNA de Interação com Piwi , Perfilação da Expressão Gênica , MicroRNAs/metabolismo , Redução de PesoRESUMO
BACKGROUND: Spleen stiffness measurement (SSM) performed by transient elastography at 100 Hz is a novel technology for the evaluation of portal hypertension in advanced chronic liver disease, but technical aspects are lacking. We aimed to evaluate the intraexamination variability of SSM and to determine the best transient elastography protocol for obtaining robust measurements to be used in clinical practice. METHODS: We analyzed 253 SSM exams with up to 20 scans for each examination, performed between April 2021 and June 2022. All SSM results were evaluated according to different protocols by dividing data into groups of n measurements (from 2 to 19). Considering as reference the median SSM values across all the 20 measurements, we calculated the distribution of the absolute deviations of each protocol from the reference median. This analysis was repeated 1,000 times by resampling the data. Distributions were also stratified by etiology (chronic liver disease versus clinically significant portal hypertension) and different SSM ranges: < 25 kPa, 25-75, and > 75 kPa. RESULTS: Overall, we observed that the spleen stiffness exam had less variability if it exceeded 12 measurements, i.e., absolute deviations ≤ 5 kPa at 95% confidence. For exams with higher SSM values (> 75 kPa), as seen in clinically significant portal hypertension, at least 15 measurements are highly recommendable. CONCLUSIONS: Fifteen scans per examination should be considered for each SSM exam performed at 100 Hz to achieve a low intraexamination variability within a reasonable time in clinical practice. RELEVANCE STATEMENT: Performing at least 15 scans per examination is recommended for 100 Hz SSM in order to achieve a low intraexamination variability, in particular for values > 75 kPa compatible with clinically significant portal hypertension. KEY POINTS: ⢠Spleen stiffness measurement by transient elastography is used for stratification in patients with portal hypertension. ⢠At 100 Hz, this method may have intraexamination variability. ⢠A minimum of 15 scans per examination achieves a low intraexamination variability.
Assuntos
Técnicas de Imagem por Elasticidade , Hipertensão Portal , Humanos , Baço/diagnóstico por imagem , Técnicas de Imagem por Elasticidade/métodos , Hipertensão Portal/diagnóstico por imagemRESUMO
BACKGROUND: The 8q24 locus is enriched in cancer-associated polymorphisms and, despite containing relatively few protein-coding genes, it hosts the MYC oncogene and other genetic elements connected to tumorigenesis, including microRNAs (miRNAs). Research on miRNAs may provide insights into the transcriptomic regulation of this multiple cancer-associated region. MATERIAL AND METHODS: We profiled all miRNAs located in the 8q24 region in 120 colorectal cancer (CRC) patients and 80 controls. miRNA profiling was performed on cancer/non-malignant adjacent mucosa, stool, and plasma extracellular vesicles (EVs), and the results validated with The Cancer Genome Atlas (TCGA) data. To verify if the 8q24-annotated miRNAs altered in CRC were dysregulated in other cancers and biofluids, we evaluated their levels in bladder cancer (BC) cases from the TCGA dataset and in urine and plasma EVs from a set of BC cases and healthy controls. RESULTS: Among the detected mature miRNAs in the region, 12 were altered between CRC and adjacent mucosa (adj. p < 0.05). Five and four miRNAs were confirmed as dysregulated in the CRC and BC TCGA dataset, respectively. A co-expression analysis of tumor/adjacent tissue data from the CRC group revealed a correlation between the dysregulated miRNAs and CRC-related genes (PVT1 and MYC) annotated in 8q24 region. miR-30d-5p and miR-151a-3p, altered in CRC tissue, were also dysregulated in stool of CRC patients and urine of BC cases, respectively. Functional enrichment of dysregulated miRNA target genes highlighted terms related to TP53-mediated cell cycle control. CONCLUSIONS: Altered expression of 8q24-annotated miRNAs may be relevant for the initiation and/or progression of cancer.
Assuntos
Neoplasias Colorretais , MicroRNAs , Neoplasias da Bexiga Urinária , Humanos , MicroRNAs/metabolismo , Perfilação da Expressão Gênica/métodos , Transcriptoma , Neoplasias da Bexiga Urinária/genética , Neoplasias Colorretais/patologia , Regulação Neoplásica da Expressão Gênica , Biomarcadores Tumorais/metabolismoRESUMO
BACKGROUND: Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterised by the progressive loss of motor neurons in the brain and spinal cord. The fact that ALS's disease course is highly heterogeneous, and its determinants not fully known, combined with ALS's relatively low prevalence, renders the successful application of artificial intelligence (AI) techniques particularly arduous. OBJECTIVE: This systematic review aims at identifying areas of agreement and unanswered questions regarding two notable applications of AI in ALS, namely the automatic, data-driven stratification of patients according to their phenotype, and the prediction of ALS progression. Differently from previous works, this review is focused on the methodological landscape of AI in ALS. METHODS: We conducted a systematic search of the Scopus and PubMed databases, looking for studies on data-driven stratification methods based on unsupervised techniques resulting in (A) automatic group discovery or (B) a transformation of the feature space allowing patient subgroups to be identified; and for studies on internally or externally validated methods for the prediction of ALS progression. We described the selected studies according to the following characteristics, when applicable: variables used, methodology, splitting criteria and number of groups, prediction outcomes, validation schemes, and metrics. RESULTS: Of the starting 1604 unique reports (2837 combined hits between Scopus and PubMed), 239 were selected for thorough screening, leading to the inclusion of 15 studies on patient stratification, 28 on prediction of ALS progression, and 6 on both stratification and prediction. In terms of variables used, most stratification and prediction studies included demographics and features derived from the ALSFRS or ALSFRS-R scores, which were also the main prediction targets. The most represented stratification methods were K-means, and hierarchical and expectation-maximisation clustering; while random forests, logistic regression, the Cox proportional hazard model, and various flavours of deep learning were the most widely used prediction methods. Predictive model validation was, albeit unexpectedly, quite rarely performed in absolute terms (leading to the exclusion of 78 eligible studies), with the overwhelming majority of included studies resorting to internal validation only. CONCLUSION: This systematic review highlighted a general agreement in terms of input variable selection for both stratification and prediction of ALS progression, and in terms of prediction targets. A striking lack of validated models emerged, as well as a general difficulty in reproducing many published studies, mainly due to the absence of the corresponding parameter lists. While deep learning seems promising for prediction applications, its superiority with respect to traditional methods has not been established; there is, instead, ample room for its application in the subfield of patient stratification. Finally, an open question remains on the role of new environmental and behavioural variables collected via novel, real-time sensors.