RESUMO
Increased attention to the rehabilitation needs of children with cancer is vital to enhance health, quality-of-life, and productivity outcomes. Among adults with cancer, rehabilitation recommendations are frequently incorporated into guidelines, but the extent to which recommendations exist for children is unknown. Reports included in this systematic review are guideline or expert consensus reports containing recommendations related to rehabilitation referral, evaluation, and/or intervention for individuals diagnosed with cancer during childhood (younger than 18 years). Eligible reports were published in English from January 2000 to August 2022. Through database searches, 42,982 records were identified; 62 records were identified through citation and website searching. Twenty-eight reports were included in the review: 18 guidelines and 10 expert consensus reports. Rehabilitation recommendations were identified in disease-specific (e.g., acute lymphoblastic leukemia), impairment-specific (e.g., fatigue, neurocognition, pain), adolescent and young adult, and long-term follow-up reports. Example recommendations included physical activity and energy-conservation techniques to address fatigue, referral to physical therapy for chronic pain management, ongoing psychosocial surveillance, and referral to speech-language pathology for those with hearing loss. High-level evidence supported rehabilitation recommendations for long-term follow-up care, fatigue, and psychosocial/mental health screening. Few intervention recommendations were included in guideline and consensus reports. In this developing field, it is critical to include pediatric oncology rehabilitation providers in guideline and consensus development initiatives. This review enhances the availability and clarity of rehabilitation-relevant guidelines that can help prevent and mitigate cancer-related disability among children by supporting access to rehabilitation services.
Assuntos
Exercício Físico , Neoplasias , Adolescente , Humanos , Criança , Consenso , Atenção à Saúde , OncologiaRESUMO
Guidelines promote high quality cancer care. Rehabilitation recommendations in oncology guidelines have not been characterized and may provide insight to improve integration of rehabilitation into oncology care. This report was developed as a part of the World Health Organization (WHO) Rehabilitation 2030 initiative to identify rehabilitation-specific recommendations in guidelines for oncology care. A systematic review of guidelines was conducted. Only guidelines published in English, for adults with cancer, providing recommendations for rehabilitation referral and assessment or interventions between 2009 and 2019 were included. 13840 articles were identified. After duplicates and applied filters, 4897 articles were screened. 69 guidelines were identified with rehabilitation-specific recommendations. Thirty-seven of the 69 guidelines endorsed referral to rehabilitation services but provided no specific recommendations regarding assessment or interventions. Thirty-two of the 69 guidelines met the full inclusion criteria and were assessed using the AGREE II tool. Twenty-one of these guidelines achieved an AGREE II quality score of ≥ 45 and were fully extracted. Guidelines exclusive to pharmacologic interventions and complementary and alternative interventions were excluded. Findings identify guidelines that recommend rehabilitation services across many cancer types and for various consequences of cancer treatment signifying that rehabilitation is a recognized component of oncology care. However, these findings are at odds with clinical reports of low rehabilitation utilization rates suggesting that guideline recommendations may be overlooked. Considering that functional morbidity negatively affects a majority of cancer survivors, improving guideline concordant rehabilitative care could have substantial impact on function and quality of life among cancer survivors.
Assuntos
Terapia por Exercício/normas , Oncologia/normas , Neoplasias/reabilitação , Guias de Prática Clínica como Assunto , Qualidade de Vida , Sobreviventes de Câncer/psicologia , Terapia por Exercício/métodos , Humanos , Oncologia/métodos , Neoplasias/complicações , Neoplasias/psicologia , SobrevivênciaRESUMO
Dynamic protein phosphorylation and dephosphorylation are essential regulatory mechanisms that ensure proper cellular signaling and biological functions. Deregulation of either reaction has been implicated in several human diseases. Here, we focus on the mechanisms that govern the specificity of the dephosphorylation reaction. Most cellular serine/threonine dephosphorylation is catalyzed by 13 highly conserved phosphoprotein phosphatase (PPP) catalytic subunits, which form hundreds of holoenzymes by binding to regulatory and scaffolding subunits. PPP holoenzymes recognize phosphorylation site consensus motifs and interact with short linear motifs (SLiMs) or structural elements distal to the phosphorylation site. We review recent advances in understanding the mechanisms of PPP site-specific dephosphorylation preference and substrate recruitment and highlight examples of their interplay in the regulation of cell division.
Assuntos
Fosfoproteínas Fosfatases , Humanos , Fosforilação , Fosfoproteínas Fosfatases/metabolismo , Domínio Catalítico , Holoenzimas/química , Holoenzimas/metabolismo , Especificidade por SubstratoRESUMO
The C-terminal domain (CTD) of RNA polymerase II (Pol II) is composed of repeats of the consensus YSPTSPS and is an essential binding scaffold for transcription-associated factors. Metazoan CTDs have well-conserved lengths and sequence compositions arising from the evolution of divergent motifs, features thought to be essential for development. On the contrary, we show that a truncated CTD composed solely of YSPTSPS repeats supports Drosophila viability but that a CTD with enough YSPTSPS repeats to match the length of the wild-type Drosophila CTD is defective. Furthermore, a fluorescently tagged CTD lacking the rest of Pol II dynamically enters transcription compartments, indicating that the CTD functions as a signal sequence. However, CTDs with too many YSPTSPS repeats are more prone to localize to static nuclear foci separate from the chromosomes. We propose that the sequence complexity of the CTD offsets aberrant behavior caused by excessive repetitive sequences without compromising its targeting function.
Assuntos
Motivos de Aminoácidos , Sequência Consenso , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/enzimologia , RNA Polimerase II/metabolismo , Sequências Repetitivas de Aminoácidos , Glândulas Salivares/enzimologia , Animais , Animais Geneticamente Modificados , Proteínas de Drosophila/química , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Mutação , Domínios Proteicos , RNA Polimerase II/química , RNA Polimerase II/genética , Glândulas Salivares/embriologia , Transcrição Gênica , Ativação TranscricionalRESUMO
Understanding natural protein evolution and designing novel proteins are motivating interest in development of high-throughput methods to explore large sequence spaces. In this work, we demonstrate the application of multisite λ dynamics (MSλD), a rigorous free energy simulation method, and chemical denaturation experiments to quantify evolutionary selection pressure from sequence-stability relationships and to address questions of design. This study examines a mesophilic phylogenetic clade of ribonuclease H (RNase H), furthering its extensive characterization in earlier studies, focusing on E. coli RNase H (ecRNH) and a more stable consensus sequence (AncCcons) differing at 15 positions. The stabilities of 32,768 chimeras between these two sequences were computed using the MSλD framework. The most stable and least stable chimeras were predicted and tested along with several other sequences, revealing a designed chimera with approximately the same stability increase as AncCcons, but requiring only half the mutations. Comparing the computed stabilities with experiment for 12 sequences reveals a Pearson correlation of 0.86 and root mean squared error of 1.18 kcal/mol, an unprecedented level of accuracy well beyond less rigorous computational design methods. We then quantified selection pressure using a simple evolutionary model in which sequences are selected according to the Boltzmann factor of their stability. Selection temperatures from 110 to 168 K are estimated in three ways by comparing experimental and computational results to evolutionary models. These estimates indicate selection pressure is high, which has implications for evolutionary dynamics and for the accuracy required for design, and suggests accurate high-throughput computational methods like MSλD may enable more effective protein design.
Assuntos
Escherichia coli , Ribonuclease H , Escherichia coli/genética , Filogenia , Simulação por Computador , Sequência Consenso , Ribonuclease H/genéticaRESUMO
Nanopore sequence technology has demonstrated a longer read length and enabled to potentially address the limitations of short-read sequencing including long-range haplotype phasing and accurate variant calling. However, there is still room for improvement in terms of the performance of single nucleotide variant (SNV) identification and computing resource usage for the state-of-the-art approaches. In this work, we introduce miniSNV, a lightweight SNV calling algorithm that simultaneously achieves high performance and yield. miniSNV utilizes known common variants in populations as variation backgrounds and leverages read pileup, read-based phasing, and consensus generation to identify and genotype SNVs for Oxford Nanopore Technologies (ONT) long reads. Benchmarks on real and simulated ONT data under various error profiles demonstrate that miniSNV has superior sensitivity and comparable accuracy on SNV detection and runs faster with outstanding scalability and lower memory than most state-of-the-art variant callers. miniSNV is available from https://github.com/CuiMiao-HIT/miniSNV.
Assuntos
Algoritmos , Sequenciamento por Nanoporos , Polimorfismo de Nucleotídeo Único , Sequenciamento por Nanoporos/métodos , Software , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodosRESUMO
With the increasing prevalence of age-related chronic diseases burdening healthcare systems, there is a pressing need for innovative management strategies. Our study focuses on the gut microbiota, essential for metabolic, nutritional, and immune functions, which undergoes significant changes with aging. These changes can impair intestinal function, leading to altered microbial diversity and composition that potentially influence health outcomes and disease progression. Using advanced metagenomic sequencing, we explore the potential of personalized probiotic supplements in 297 older adults by analyzing their gut microbiota. We identified distinctive Lactobacillus and Bifidobacterium signatures in the gut microbiota of older adults, revealing probiotic patterns associated with various population characteristics, microbial compositions, cognitive functions, and neuroimaging results. These insights suggest that tailored probiotic supplements, designed to match individual probiotic profile, could offer an innovative method for addressing age-related diseases and functional declines. Our findings enhance the existing evidence base for probiotic use among older adults, highlighting the opportunity to create more targeted and effective probiotic strategies. However, additional research is required to validate our results and further assess the impact of precision probiotics on aging populations. Future studies should employ longitudinal designs and larger cohorts to conclusively demonstrate the benefits of tailored probiotic treatments.
Assuntos
Envelhecimento , Suplementos Nutricionais , Microbioma Gastrointestinal , Probióticos , Probióticos/uso terapêutico , Probióticos/administração & dosagem , Humanos , Idoso , Feminino , Masculino , Idoso de 80 Anos ou mais , Pessoa de Meia-Idade , Lactobacillus/genética , Metagenômica/métodos , BifidobacteriumRESUMO
Transposable elements (TEs) are major components of eukaryotic genomes and are implicated in a range of evolutionary processes. Yet, TE annotation and characterization remain challenging, particularly for nonspecialists, since existing pipelines are typically complicated to install, run, and extract data from. Current methods of automated TE annotation are also subject to issues that reduce overall quality, particularly (i) fragmented and overlapping TE annotations, leading to erroneous estimates of TE count and coverage, and (ii) repeat models represented by short sections of total TE length, with poor capture of 5' and 3' ends. To address these issues, we present Earl Grey, a fully automated TE annotation pipeline designed for user-friendly curation and annotation of TEs in eukaryotic genome assemblies. Using nine simulated genomes and an annotation of Drosophila melanogaster, we show that Earl Grey outperforms current widely used TE annotation methodologies in ameliorating the issues mentioned above while scoring highly in benchmarking for TE annotation and classification and being robust across genomic contexts. Earl Grey provides a comprehensive and fully automated TE annotation toolkit that provides researchers with paper-ready summary figures and outputs in standard formats compatible with other bioinformatics tools. Earl Grey has a modular format, with great scope for the inclusion of additional modules focused on further quality control and tailored analyses in future releases.
Assuntos
Elementos de DNA Transponíveis , Drosophila melanogaster , Animais , Elementos de DNA Transponíveis/genética , Anotação de Sequência Molecular , Drosophila melanogaster/genética , Genômica/métodos , Biologia ComputacionalRESUMO
Identification of potential targets for known bioactive compounds and novel synthetic analogs is of considerable significance. In silico target fishing (TF) has become an alternative strategy because of the expensive and laborious wet-lab experiments, explosive growth of bioactivity data and rapid development of high-throughput technologies. However, these TF methods are based on different algorithms, molecular representations and training datasets, which may lead to different results when predicting the same query molecules. This can be confusing for practitioners in practical applications. Therefore, this study systematically evaluated nine popular ligand-based TF methods based on target and ligand-target pair statistical strategies, which will help practitioners make choices among multiple TF methods. The evaluation results showed that SwissTargetPrediction was the best method to produce the most reliable predictions while enriching more targets. High-recall similarity ensemble approach (SEA) was able to find real targets for more compounds compared with other TF methods. Therefore, SwissTargetPrediction and SEA can be considered as primary selection methods in future studies. In addition, the results showed that k = 5 was the optimal number of experimental candidate targets. Finally, a novel ensemble TF method based on consensus voting is proposed to improve the prediction performance. The precision of the ensemble TF method outperforms the individual TF method, indicating that the ensemble TF method can more effectively identify real targets within a given top-k threshold. The results of this study can be used as a reference to guide practitioners in selecting the most effective methods in computational drug discovery.
Assuntos
Algoritmos , LigantesRESUMO
Multi-omics data integration is a complex and challenging task in biomedical research. Consensus clustering, also known as meta-clustering or cluster ensembles, has become an increasingly popular downstream tool for phenotyping and endotyping using multiple omics and clinical data. However, current consensus clustering methods typically rely on ensembling clustering outputs with similar sample coverages (mathematical replicates), which may not reflect real-world data with varying sample coverages (biological replicates). To address this issue, we propose a new consensus clustering with missing labels (ccml) strategy termed ccml, an R protocol for two-step consensus clustering that can handle unequal missing labels (i.e. multiple predictive labels with different sample coverages). Initially, the regular consensus weights are adjusted (normalized) by sample coverage, then a regular consensus clustering is performed to predict the optimal final cluster. We applied the ccml method to predict molecularly distinct groups based on 9-omics integration in the Karolinska COSMIC cohort, which investigates chronic obstructive pulmonary disease, and 24-omics handprint integrative subgrouping of adult asthma patients of the U-BIOPRED cohort. We propose ccml as a downstream toolkit for multi-omics integration analysis algorithms such as Similarity Network Fusion and robust clustering of clinical data to overcome the limitations posed by missing data, which is inevitable in human cohorts consisting of multiple data modalities. The ccml tool is available in the R language (https://CRAN.R-project.org/package=ccml, https://github.com/pulmonomics-lab/ccml, or https://github.com/ZhoulabCPH/ccml).
Assuntos
Asma , Multiômica , Adulto , Humanos , Consenso , Análise por Conglomerados , Algoritmos , Asma/genéticaRESUMO
Thermostable proteins find their use in numerous biomedical and biotechnological applications. However, the computational design of stable proteins often results in single-point mutations with a limited effect on protein stability. However, the construction of stable multiple-point mutants can prove difficult due to the possibility of antagonistic effects between individual mutations. FireProt protocol enables the automated computational design of highly stable multiple-point mutants. FireProt 2.0 builds on top of the previously published FireProt web, retaining the original functionality and expanding it with several new stabilization strategies. FireProt 2.0 integrates the AlphaFold database and the homology modeling for structure prediction, enabling calculations starting from a sequence. Multiple-point designs are constructed using the Bron-Kerbosch algorithm minimizing the antagonistic effect between the individual mutations. Users can newly limit the FireProt calculation to a set of user-defined mutations, run a saturation mutagenesis of the whole protein or select rigidifying mutations based on B-factors. Evolution-based back-to-consensus strategy is complemented by ancestral sequence reconstruction. FireProt 2.0 is significantly faster and a reworked graphical user interface broadens the tool's availability even to users with older hardware. FireProt 2.0 is freely available at http://loschmidt.chemi.muni.cz/fireprotweb.
Assuntos
Algoritmos , Proteínas , Proteínas/genética , Proteínas/química , Mutação , Estabilidade Proteica , InternetRESUMO
This meeting report presents a consensus on the biological aspects of lipid emulsions in parenteral nutrition, emphasizing the unanimous support for the integration of lipid emulsions, particularly those containing fish oil, owing to their many potential benefits beyond caloric provision. Lipid emulsions have evolved from simple energy sources to complex formulations designed to improve safety profiles and offer therapeutic benefits. The consensus highlights the critical role of omega-3 polyunsaturated fatty acids (PUFAs), notably eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA), found in fish oil and other marine oils, for their anti-inflammatory properties, muscle mass preservation, and as precursors to the specialized pro-resolving mediators (SPMs). SPMs play a significant role in immune modulation, tissue repair, and the active resolution of inflammation without impairing host defense mechanisms. The panel's agreement underscores the importance of incorporating fish oil within clinical practices to facilitate recovery in conditions like surgery, critical illness, or immobility, while cautioning against therapies that might disrupt natural inflammation resolution processes. This consensus not only reaffirms the role of specific lipid components in enhancing patient outcomes, but also suggests a shift towards nutrition-based therapeutic strategies in clinical settings, advocating for the proactive evidence-based use of lipid emulsions enriched with omega-3 PUFAs. Furthermore, we should seek to apply our knowledge concerning DHA, EPA, and their SPM derivatives, to produce more informative randomized controlled trial protocols, thus allowing more authoritative clinical recommendations.
Assuntos
Inflamação , Humanos , Inflamação/metabolismo , Ácidos Graxos Ômega-3/uso terapêutico , Ácidos Graxos Ômega-3/metabolismo , Músculo Esquelético/metabolismo , Músculo Esquelético/efeitos dos fármacos , Ácido Eicosapentaenoico/uso terapêutico , Ácido Eicosapentaenoico/farmacologia , Nutrição Parenteral/métodos , Óleos de Peixe/uso terapêutico , Ácidos Docosa-Hexaenoicos/uso terapêutico , Emulsões Gordurosas Intravenosas/uso terapêutico , AnimaisRESUMO
Partial epithelial-mesenchymal transition (p-EMT) has recently been identified as a hybrid state consisting of cells with both epithelial and mesenchymal characteristics and is associated with the migration, metastasis, and chemoresistance of cancer cells. Here, we describe the induction of p-EMT in starved colorectal cancer (CRC) cells and identify a p-EMT gene signature that can predict prognosis. Functional characterisation of starvation-induced p-EMT in HCT116, DLD1, and HT29 cells showed changes in proliferation, morphology, and drug sensitivity, supported by in vivo studies using the chorioallantoic membrane model. An EMT-specific quantitative polymerase chain reaction (qPCR) array was used to screen for deregulated genes, leading to the establishment of an in silico gene signature that was correlated with poor disease-free survival in CRC patients along with the CRC consensus molecular subtype CMS4. Among the significantly deregulated p-EMT genes, a triple-gene signature consisting of SERPINE1, SOX10, and epidermal growth factor receptor (EGFR) was identified. Starvation-induced p-EMT was characterised by increased migratory potential and chemoresistance, as well as E-cadherin processing and internalisation. Both gene signature and E-cadherin alterations could be reversed by the proteasomal inhibitor MG132. Spatially resolving EGFR expression with high-resolution immunofluorescence imaging identified a proliferation stop in starved CRC cells caused by EGFR internalisation. In conclusion, we have gained insight into a previously undiscovered EMT mechanism that may become relevant when tumour cells are under nutrient stress, as seen in early stages of metastasis. Targeting this process of tumour cell dissemination might help to prevent EMT and overcome drug resistance. © 2024 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Assuntos
Neoplasias Colorretais , Humanos , Neoplasias Colorretais/patologia , Proliferação de Células , Transição Epitelial-Mesenquimal/genética , Receptores ErbB , Linhagem Celular Tumoral , Caderinas/genética , Caderinas/metabolismo , Movimento CelularRESUMO
Autism spectrum disorder stands as a multifaceted and heterogeneous neurodevelopmental condition. The utilization of functional magnetic resonance imaging to construct functional brain networks proves instrumental in comprehending the intricate interplay between brain activity and autism spectrum disorder, thereby elucidating the underlying pathogenesis at the cerebral level. Traditional functional brain networks, however, typically confine their examination to connectivity effects within a specific frequency band, disregarding potential connections among brain areas that span different frequency bands. To harness the full potential of interregional connections across diverse frequency bands within the brain, our study endeavors to develop a novel multi-frequency analysis method for constructing a comprehensive functional brain networks that incorporates multiple frequencies. Specifically, our approach involves the initial decomposition of functional magnetic resonance imaging into distinct frequency bands through wavelet transform. Subsequently, Pearson correlation is employed to generate corresponding functional brain networks and kernel for each frequency band. Finally, the classification was performed by a multi-kernel support vector machine, to preserve the connectivity effects within each band and the connectivity patterns shared among the different bands. Our proposed multi-frequency functional brain networks method yielded notable results, achieving an accuracy of 89.1%, a sensitivity of 86.67%, and an area under the curve of 0.942 in a publicly available autism spectrum disorder dataset.
Assuntos
Transtorno do Espectro Autista , Encéfalo , Conectoma , Imageamento por Ressonância Magnética , Humanos , Transtorno do Espectro Autista/fisiopatologia , Transtorno do Espectro Autista/diagnóstico por imagem , Conectoma/métodos , Imageamento por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Encéfalo/fisiopatologia , Masculino , Máquina de Vetores de Suporte , Feminino , Vias Neurais/fisiopatologia , Vias Neurais/diagnóstico por imagem , Adulto Jovem , Rede Nervosa/diagnóstico por imagem , Rede Nervosa/fisiopatologia , Análise de Ondaletas , Adulto , AdolescenteRESUMO
BACKGROUND: Single-cell transcriptome sequencing (scRNA-Seq) has allowed new types of investigations at unprecedented levels of resolution. Among the primary goals of scRNA-Seq is the classification of cells into distinct types. Many approaches build on existing clustering literature to develop tools specific to single-cell. However, almost all of these methods rely on heuristics or user-supplied parameters to control the number of clusters. This affects both the resolution of the clusters within the original dataset as well as their replicability across datasets. While many recommendations exist, in general, there is little assurance that any given set of parameters will represent an optimal choice in the trade-off between cluster resolution and replicability. For instance, another set of parameters may result in more clusters that are also more replicable. RESULTS: Here, we propose Dune, a new method for optimizing the trade-off between the resolution of the clusters and their replicability. Our method takes as input a set of clustering results-or partitions-on a single dataset and iteratively merges clusters within each partitions in order to maximize their concordance between partitions. As demonstrated on multiple datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters as well as concordance with ground truth. Dune is available as an R package on Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/Dune.html . CONCLUSIONS: Cluster refinement by Dune helps improve the robustness of any clustering analysis and reduces the reliance on tuning parameters. This method provides an objective approach for borrowing information across multiple clusterings to generate replicable clusters most likely to represent common biological features across multiple datasets.
Assuntos
RNA-Seq , Análise de Célula Única , Software , Análise de Célula Única/métodos , RNA-Seq/métodos , Análise por Conglomerados , Algoritmos , Análise de Sequência de RNA/métodos , Humanos , Transcriptoma/genética , Reprodutibilidade dos Testes , Perfilação da Expressão Gênica/métodos , Análise da Expressão Gênica de Célula ÚnicaRESUMO
BACKGROUND: Structural variations play a significant role in genetic diseases and evolutionary mechanisms. Extensive research has been conducted over the past decade to detect simple structural variations, leading to the development of well-established detection methods. However, recent studies have highlighted the potentially greater impact of complex structural variations on individuals compared to simple structural variations. Despite this, the field still lacks precise detection methods specifically designed for complex structural variations. Therefore, the development of a highly efficient and accurate detection method is of utmost importance. RESULT: In response to this need, we propose a novel method called FindCSV, which leverages deep learning techniques and consensus sequences to enhance the detection of SVs using long-read sequencing data. Compared to current methods, FindCSV performs better in detecting complex and simple structural variations. CONCLUSIONS: FindCSV is a new method to detect complex and simple structural variations with reasonable accuracy in real and simulated data. The source code for the program is available at https://github.com/nwpuzhengyan/FindCSV .
Assuntos
Software , Humanos , Aprendizado Profundo , Variação Estrutural do Genoma , Análise de Sequência de DNA/métodos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
BACKGROUND: Next-generation sequencing (NGS) technologies offer fast and inexpensive identification of DNA sequences. Somatic sequencing is among the primary applications of NGS, where acquired (non-inherited) variants are based on comparing diseased and healthy tissues from the same individual. Somatic mutations in genetic diseases such as cancer are tightly associated with genomic instability. Genomic instability increases heterogenity, complicating sequencing efforts further, a task already challenged by the presence of short reads and repetitions in human DNA. This leads to low concordance among studies and limits reproducibility. This limitation is a significant problem since identified mutations in somatic sequencing are major biomarkers for diagnosis and the primary input of targeted therapies. Benchmarking studies were conducted to assess the error rates and increase reproducibility. Unfortunately, the number of somatic benchmarking sets is very limited due to difficulties in validating true somatic variants. Moreover, most NGS benchmarking studies are based on relatively simpler germline (inherited) sequencing. Recently, a comprehensive somatic sequencing benchmarking set was published by Sequencing Quality Control Phase 2 (SEQC2). We chose this dataset for our experiments because it is a well-validated, cancer-focused dataset that includes many tumor/normal biological replicates. Our study has two primary goals. First goal is to determine how replicate-based consensus approaches can improve the accuracy of somatic variant detection systems. Second goal is to develop highly predictive machine learning (ML) models by employing replicate-based consensus variants as labels during the training phase. RESULTS: Ensemble approaches that combine alternative algorithms are relatively common; here, as an alternative, we study the performance enhancement potential of biological replicates. We first developed replicate-based consensus approaches that utilize the biological replicates available in this study to improve variant calling performance. Subsequently, we trained ML models using these biological replicates and achieved performance comparable to optimal ML models, those trained using high-confidence variants identified in advance. CONCLUSIONS: Our replicate-based consensus approach can be used to improve variant calling performance and develop efficient ML models. Given the relative ease of obtaining biological replicates, this strategy allows for the development of efficient ML models tailored to specific datasets or scenarios.
Assuntos
Algoritmos , Neoplasias , Humanos , Reprodutibilidade dos Testes , Sequenciamento do Exoma , Neoplasias/genética , Instabilidade Genômica , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
Lung adenocarcinoma (LUAD) is a tumour characterized by high tumour heterogeneity. Although there are numerous prognostic and immunotherapeutic options available for LUAD, there is a dearth of precise, individualized treatment plans. We integrated mRNA, lncRNA, microRNA, methylation and mutation data from the TCGA database for LUAD. Utilizing ten clustering algorithms, we identified stable multi-omics consensus clusters (MOCs). These data were then amalgamated with ten machine learning approaches to develop a robust model capable of reliably identifying patient prognosis and predicting immunotherapy outcomes. Through ten clustering algorithms, two prognostically relevant MOCs were identified, with MOC2 showing more favourable outcomes. We subsequently constructed a MOCs-associated machine learning model (MOCM) based on eight MOCs-specific hub genes. Patients characterized by a lower MOCM score exhibited better overall survival and responses to immunotherapy. These findings were consistent across multiple datasets, and compared to many previously published LUAD biomarkers, our MOCM score demonstrated superior predictive performance. Notably, the low MOCM group was more inclined towards 'hot' tumours, characterized by higher levels of immune cell infiltration. Intriguingly, a significant positive correlation between GJB3 and the MOCM score (R = 0.77, p < 0.01) was discovered. Further experiments confirmed that GJB3 significantly enhances LUAD proliferation, invasion and migration, indicating its potential as a key target for LUAD treatment. Our developed MOCM score accurately predicts the prognosis of LUAD patients and identifies potential beneficiaries of immunotherapy, offering broad clinical applicability.
Assuntos
Adenocarcinoma de Pulmão , Biomarcadores Tumorais , Regulação Neoplásica da Expressão Gênica , Imunoterapia , Neoplasias Pulmonares , Aprendizado de Máquina , Humanos , Imunoterapia/métodos , Prognóstico , Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/imunologia , Adenocarcinoma de Pulmão/patologia , Adenocarcinoma de Pulmão/terapia , Biomarcadores Tumorais/genética , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/terapia , Neoplasias Pulmonares/imunologia , Neoplasias Pulmonares/patologia , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/mortalidade , Perfilação da Expressão Gênica , MicroRNAs/genética , MultiômicaRESUMO
Plant legumains are Asn/Asp-specific endopeptidases that have diverse functions in plants. Peptide asparaginyl ligases (PALs) are a special legumain subtype that primarily catalyze peptide bond formation rather than hydrolysis. PALs are versatile protein engineering tools but are rarely found in nature. To overcome this limitation, here we describe a two-step method to design and engineer a high-yield and efficient recombinant PAL based on commonly found asparaginyl endopeptidases. We first constructed a consensus sequence derived from 1500 plant legumains to design the evolutionarily stable legumain conLEG that could be produced in E. coli with 20-fold higher yield relative to that for natural legumains. We then applied the ligase-activity determinant hypothesis to exploit conserved residues in PAL substrate-binding pockets and convert conLEG into conPAL1-3. Functional studies showed that conLEG is primarily a hydrolase, whereas conPALs are ligases. Importantly, conPAL3 is a superefficient and broadly active PAL for protein cyclization and ligation.
Assuntos
Escherichia coli , Proteínas de Plantas , Sequência de Aminoácidos , Proteínas de Plantas/metabolismo , Ciclização , Escherichia coli/genética , Escherichia coli/metabolismo , Plantas/metabolismo , Peptídeo Sintases/metabolismo , Engenharia de Proteínas , Peptídeos/metabolismo , Endopeptidases/metabolismoRESUMO
BACKGROUND: Sika deer (Cervus nippon) holds significance among cervids, with three genomes recently published. However, these genomes still contain hundreds of gaps and display significant discrepancies in continuity and accuracy. This poses challenges to functional genomics research and the selection of an appropriate reference genome. Thus, obtaining a high-quality reference genome is imperative to delve into functional genomics effectively. FINDINGS: Here we report a high-quality consensus genome of male sika deer. All 34 chromosomes are assembled into single-contig pseudomolecules without any gaps, which is the most complete assembly. The genome size is 2.7G with 23,284 protein-coding genes. Comparative genomics analysis found that the genomes of sika deer and red deer are highly conserved, an approximately 2.4G collinear regions with up to 99% sequence similarity. Meanwhile, we observed the fusion of red deer's Chr23 and Chr4 during evolution, forming sika deer's Chr1. Additionally, we identified 607 transcription factors (TFs) that are involved in the regulation of antler development, including RUNX2, SOX6, SOX8, SOX9, PAX8, SIX2, SIX4, SIX6, SPI1, NFAC1, KLHL8, ZN710, JDP2, and TWST2, based on this consensus reference genome. CONCLUSIONS: Our results indicated that we acquired a high-quality consensus reference genome. That provided valuable resources for understanding functional genomics. In addition, discovered the genetic basis of sika-red hybrid fertility and identified 607 significant TFs that impact antler development.