Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 23.309
1.
Commun Biol ; 7(1): 675, 2024 Jun 01.
Article En | MEDLINE | ID: mdl-38824179

The three-dimensional (3D) organization of genome is fundamental to cell biology. To explore 3D genome, emerging high-throughput approaches have produced billions of sequencing reads, which is challenging and time-consuming to analyze. Here we present Microcket, a package for mapping and extracting interacting pairs from 3D genomics data, including Hi-C, Micro-C, and derivant protocols. Microcket utilizes a unique read-stitch strategy that takes advantage of the long read cycles in modern DNA sequencers; benchmark evaluations reveal that Microcket runs much faster than the current tools along with improved mapping efficiency, and thus shows high potential in accelerating and enhancing the biological investigations into 3D genome. Microcket is freely available at https://github.com/hellosunking/Microcket .


Genomics , Software , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Sequence Analysis, DNA/methods , Data Analysis
2.
BMC Genomics ; 25(1): 549, 2024 Jun 01.
Article En | MEDLINE | ID: mdl-38824509

BACKGROUND: Despite Spirochetales being a ubiquitous and medically important order of bacteria infecting both humans and animals, there is extremely limited information regarding their bacteriophages. Of the genus Treponema, there is just a single reported characterised prophage. RESULTS: We applied a bioinformatic approach on 24 previously published Treponema genomes to identify and characterise putative treponemal prophages. Thirteen of the genomes did not contain any detectable prophage regions. The remaining eleven contained 38 prophage sequences, with between one and eight putative prophages in each bacterial genome. The prophage regions ranged from 12.4 to 75.1 kb, with between 27 and 171 protein coding sequences. Phylogenetic analysis revealed that 24 of the prophages formed three distinct sequence clusters, identifying putative myoviral and siphoviral morphology. ViPTree analysis demonstrated that the identified sequences were novel when compared to known double stranded DNA bacteriophage genomes. CONCLUSIONS: In this study, we have started to address the knowledge gap on treponeme bacteriophages by characterising 38 prophage sequences in 24 treponeme genomes. Using bioinformatic approaches, we have been able to identify and compare the prophage-like elements with respect to other bacteriophages, their gene content, and their potential to be a functional and inducible bacteriophage, which in turn can help focus our attention on specific prophages to investigate further.


Genome, Bacterial , Genomics , Phylogeny , Prophages , Treponema , Prophages/genetics , Treponema/genetics , Treponema/virology , Genomics/methods , Computational Biology/methods , Genome, Viral , Bacteriophages/genetics , Bacteriophages/classification
3.
BMC Cancer ; 24(1): 672, 2024 Jun 01.
Article En | MEDLINE | ID: mdl-38824541

BACKGROUND: Patients with primary multifocal hepatocellular carcinoma (HCC) have a poor prognosis and often experience a high rate of treatment failure. Multifocal HCC is mainly caused by intrahepatic metastasis (IM), and though portal vein tumor thrombosis (PVTT) is considered a hallmark of IM, the molecular mechanism by which primary HCC cells invade the portal veins remains unclear. Therefore, it is necessary to recognize the early signs of metastasis of HCC to arrange better treatment for patients. RESULTS: To determine the differential molecular features between primary HCC with and without phenotype of metastasis, we used the CIBERSORTx software to deconvolute cell types from bulk RNA-Seq based on a single-cell transcriptomic dataset. According to the relative abundance of tumorigenic and metastatic hepatoma cells, VEGFA+ macrophages, effector memory T cells, and natural killer cells, HCC samples were divided into five groups: Pro-T, Mix, Pro-Meta, NKC, and MemT, and the transcriptomic and genomic features of the first three groups were analyzed. We found that the Pro-T group appeared to retain native hepatic metabolic activity, whereas the Pro-Meta group underwent dedifferentiation. Genes highly expressed in the group Pro-Meta often signify a worse outcome. CONCLUSIONS: The HCC cohort can be well-typed and prognosis predicted according to tumor microenvironment components. Primary hepatocellular carcinoma may have obtained corresponding molecular features before metastasis occurred.


Carcinoma, Hepatocellular , Liver Neoplasms , Transcriptome , Tumor Microenvironment , Humans , Liver Neoplasms/genetics , Liver Neoplasms/secondary , Liver Neoplasms/pathology , Carcinoma, Hepatocellular/genetics , Carcinoma, Hepatocellular/pathology , Carcinoma, Hepatocellular/secondary , Tumor Microenvironment/genetics , Prognosis , Genomics/methods , Gene Expression Regulation, Neoplastic , Gene Expression Profiling , Male , Female , Killer Cells, Natural/metabolism , Killer Cells, Natural/immunology
4.
Article Zh | MEDLINE | ID: mdl-38811176

Objective: To compare the differences between the variation interpretation standards and guidelines issued by the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) in 2015 (The 2015ACMG/AMP guideline) and the Deafness Specialist Group of the Clinical Genome Resource (ClinGen) in 2018 for hereditary hearing loss (Healing loss, HL) issued the expert specification of the variation interpretation guide (The 2018 HL-EP guideline) in evaluating the pathogenicity of OTOF gene variation in patients with auditory neuropathy. Methods: Thirty-eight auditory neuropathy patients with OTOF gene variant were selected as the study subjects (23 males and 15 females, aged 0.3-25.9 years). Using whole-genome sequencing, whole exome sequencing or target region sequencing (Panel) combined with Sanger sequencing, 38 cases were found to carry more than two OTOF mutation sites. A total of 59 candidate variants were independently interpreted based on the 2015 ACMG/AMP guideline and 2018 HL-EP guideline. Compared with the judgment results in 2015 ACMG/AMP guideline, the variants interpreted as lower pathogenic classifications in the 2018 HL-EP guideline were defined as downgraded variants, and the variants regarded as higher pathogenic classifications were defined as upgraded variants. Statistical analysis was conducted using SPSS 20.0. Results: The concordance rate of variant classification between the guidelines was 72.9%(43/59). The 13.6%(8/59) of variants were upgraded and 13.6% (8/59) of variants downgraded in the classifications of the 2018 HL-EP guideline. A couple of rules saw significant differences between the guidelines (PVS1, PM3, PP2, PP3 and PP5). The distribution of pathogenicity of splicing mutation was statistically different (P=0.013). Conclusions: The 2018 HL-EP guideline is inconsistent with the 2015 ACMG/AMP guideline, when judging the pathogenicity of OTOF gene variants in patients with auditory neuropathy. Through the deletion and refinement of evidence and the breaking of solidification thinking, the 2018 HL-EP guideline makes the pathogenicity grading more traceable and improves the credibility.


Hearing Loss, Central , Membrane Proteins , Mutation , Humans , Female , Male , Hearing Loss, Central/genetics , Child , Adult , Adolescent , Child, Preschool , Infant , Membrane Proteins/genetics , Young Adult , Genetic Variation , Exome Sequencing , Genetic Testing/methods , Whole Genome Sequencing/methods , Genomics/methods
5.
Cell Rep Methods ; 4(5): 100776, 2024 May 20.
Article En | MEDLINE | ID: mdl-38744287

Continual advancements in genomics have led to an ever-widening disparity between the rate of discovery of genetic variants and our current understanding of their functions and potential roles in disease. Systematic methods for phenotyping DNA variants are required to effectively translate genomics data into improved outcomes for patients with genetic diseases. To make the biggest impact, these approaches must be scalable and accurate, faithfully reflect disease biology, and define complex disease mechanisms. We compare current methods to analyze the function of variants in their endogenous DNA context using genome editing strategies, such as saturation genome editing, base editing and prime editing. We discuss how these technologies can be linked to high-content readouts to gain deep mechanistic insights into variant effects. Finally, we highlight key challenges that need to be addressed to bridge the genotype to phenotype gap, and ultimately improve the diagnosis and treatment of genetic diseases.


Gene Editing , Genetic Variation , Humans , Gene Editing/methods , Genetic Variation/genetics , DNA/genetics , CRISPR-Cas Systems/genetics , Genomics/methods , Animals , Phenotype
6.
BMC Genom Data ; 25(1): 49, 2024 May 30.
Article En | MEDLINE | ID: mdl-38816818

Oreomecon nudicaulis, commonly known as mountain poppy, is a significant perennial herb. In 2022, the species O. nudicaulis, which was previously classified under the genus Papaver, was reclassified within the genus Oreomecon. Nevertheless, the phylogenetic status and chloroplast genome within the genus Oreomecon have not yet been reported. This study elucidates the chloroplast genome sequence and structural features of O. nudicaulis and explores its evolutionary relationships within Papaveraceae. Using Illumina sequencing technology, the chloroplast genome of O. nudicaulis was sequenced, assembled, and annotated. The results indicate that the chloroplast genome of O. nudicaulis exhibits a typical circular quadripartite structure. The chloroplast genome is 153,903 bp in length, with a GC content of 38.87%, containing 84 protein-coding genes, 8 rRNA genes, 38 tRNA genes, and 2 pseudogenes. The genome encodes 25,815 codons, with leucine (Leu) being the most abundant codon, and the most frequently used codon is AUU. Additionally, 129 microsatellite markers were identified, with mononucleotide repeats being the most abundant (53.49%). Our phylogenetic analysis revealed that O. nudicaulis has a relatively close relationship with the genus Meconopsis within the Papaveraceae family. The phylogenetic analysis supported the taxonomic status of O. nudicaulis, as it did not form a clade with other Papaver species, consistent with the revised taxonomy of Papaveraceae. This is the first report of a phylogenomic study of the complete chloroplast genome in the genus Oreomecon, which is a significant genus worldwide. This analysis of the O. nudicaulis chloroplast genome provides a theoretical basis for research on genetic diversity, molecular marker development, and species identification, enriching genetic information and supporting the evolutionary relationships among Papaveraceae.


Genome, Chloroplast , Phylogeny , Genome, Chloroplast/genetics , Genomics/methods , Papaveraceae/genetics , Papaveraceae/chemistry , Microsatellite Repeats/genetics , Chloroplasts/genetics , Base Composition/genetics , Evolution, Molecular , RNA, Transfer/genetics
7.
Pathol Oncol Res ; 30: 1611676, 2024.
Article En | MEDLINE | ID: mdl-38818014

The large-scale heterogeneity of genetic diseases necessitated the deeper examination of nucleotide sequence alterations enhancing the discovery of new targeted drug attack points. The appearance of new sequencing techniques was essential to get more interpretable genomic data. In contrast to the previous short-reads, longer lengths can provide a better insight into the potential health threatening genetic abnormalities. Long-reads offer more accurate variant identification and genome assembly methods, indicating advances in nucleotide deflect-related studies. In this review, we introduce the historical background of sequencing technologies and show their benefits and limits, as well. Furthermore, we highlight the differences between short- and long-read approaches, including their unique advances and difficulties in methodologies and evaluation. Additionally, we provide a detailed description of the corresponding bioinformatics and the current applications.


High-Throughput Nucleotide Sequencing , Humans , High-Throughput Nucleotide Sequencing/methods , Computational Biology/methods , Genomics/methods , Sequence Analysis, DNA/methods
8.
Cell Rep Med ; 5(5): 101565, 2024 May 21.
Article En | MEDLINE | ID: mdl-38776875

CML is readily treatable with tyrosine kinase inhibitors (TKIs); however, resistance occurs, with the disease curable in only ∼15%-20% of patients. Using integrated functional genomics, Adnan Awad et al.1 identify agents effective against CML stem cells and describe mechanisms underlying TKI resistance.


Drug Resistance, Neoplasm , Genomics , Leukemia, Myelogenous, Chronic, BCR-ABL Positive , Protein Kinase Inhibitors , Humans , Leukemia, Myelogenous, Chronic, BCR-ABL Positive/genetics , Leukemia, Myelogenous, Chronic, BCR-ABL Positive/drug therapy , Leukemia, Myelogenous, Chronic, BCR-ABL Positive/pathology , Protein Kinase Inhibitors/therapeutic use , Protein Kinase Inhibitors/pharmacology , Drug Resistance, Neoplasm/genetics , Drug Resistance, Neoplasm/drug effects , Genomics/methods
9.
PLoS One ; 19(5): e0299588, 2024.
Article En | MEDLINE | ID: mdl-38718091

Corynebacterium glutamicum is a non-pathogenic species of the Corynebacteriaceae family. It has been broadly used in industrial biotechnology for the production of valuable products. Though it is widely accepted at the industrial level, knowledge about the genomic diversity of the strains is limited. Here, we investigated the comparative genomic features of the strains and pan-genomic characteristics. We also observed phylogenetic relationships among the strains based on average nucleotide identity (ANI). We found diversity between strains at the genomic and pan-genomic levels. Less than one-third of the C. glutamicum pan-genome consists of core genes and soft-core genes. Whereas, a large number of strain-specific genes covered about half of the total pan-genome. Besides, C. glutamicum pan-genome is open and expanding, which indicates the possible addition of new gene families to the pan-genome. We also investigated the distribution of biosynthetic gene clusters (BGCs) among the strains. We discovered slight variations of BGCs at the strain level. Several BGCs with the potential to express novel bioactive secondary metabolites have been identified. Therefore, by utilizing the characteristic advantages of C. glutamicum, different strains can be potential applicants for natural drug discovery.


Corynebacterium glutamicum , Genetic Variation , Genome, Bacterial , Phylogeny , Corynebacterium glutamicum/genetics , Corynebacterium glutamicum/metabolism , Multigene Family , Genomics/methods
10.
Sci Adv ; 10(19): eadj1424, 2024 May 10.
Article En | MEDLINE | ID: mdl-38718126

The ongoing expansion of human genomic datasets propels therapeutic target identification; however, extracting gene-disease associations from gene annotations remains challenging. Here, we introduce Mantis-ML 2.0, a framework integrating AstraZeneca's Biological Insights Knowledge Graph and numerous tabular datasets, to assess gene-disease probabilities throughout the phenome. We use graph neural networks, capturing the graph's holistic structure, and train them on hundreds of balanced datasets via a robust semi-supervised learning framework to provide gene-disease probabilities across the human exome. Mantis-ML 2.0 incorporates natural language processing to automate disease-relevant feature selection for thousands of diseases. The enhanced models demonstrate a 6.9% average classification power boost, achieving a median receiver operating characteristic (ROC) area under curve (AUC) score of 0.90 across 5220 diseases from Human Phenotype Ontology, OpenTargets, and Genomics England. Notably, Mantis-ML 2.0 prioritizes associations from an independent UK Biobank phenome-wide association study (PheWAS), providing a stronger form of triaging and mitigating against underpowered PheWAS associations. Results are exposed through an interactive web resource.


Biological Specimen Banks , Neural Networks, Computer , Humans , Genome-Wide Association Study/methods , Phenotype , United Kingdom , Phenomics/methods , Genetic Predisposition to Disease , Genomics/methods , Databases, Genetic , Algorithms , Computational Biology/methods , UK Biobank
12.
Ter Arkh ; 96(3): 205-211, 2024 Apr 16.
Article Ru | MEDLINE | ID: mdl-38713033

The COVID-19 pandemic has highlighted pressing challenges in biomedical research methodology. It has become obvious that the rapid and effective development of treatments for "new" viral infections is impossible without the coordination of interdisciplinary research and in-depth analysis of data obtained within the framework of the post-genomic paradigm. Presents the results of a systematic computer analysis of 290,000 scientific articles on COVID-19, with an emphasis on the results of post-genomic studies of SARS-CoV-2. The futility of the overly simplified approach, which considers only one "most important receptor protein", only one "key virus gene", etc., is shown. It is shown how post-genomic technologies will make it possible to find informative biomarkers of severe coronavirus infection, including those based on complex immune disorders associated with COVID-19.


COVID-19 , SARS-CoV-2 , Humans , COVID-19/prevention & control , COVID-19 Drug Treatment , Genomics/methods , Antiviral Agents/therapeutic use , Antiviral Agents/pharmacology
13.
Theor Appl Genet ; 137(6): 122, 2024 May 07.
Article En | MEDLINE | ID: mdl-38713254

KEY MESSAGE: By deploying a multi-omics approach, we unraveled the mechanisms that might help rice to combat Yellow Stem Borer infestation, thus providing insights and scope for developing YSB resistant rice varieties. Yellow Stem Borer (YSB), Scirpophaga incertulas (Walker) (Lepidoptera: Crambidae), is a major pest of rice, that can lead to 20-60% loss in rice production. Effective management of YSB infestation is challenged by the non-availability of adequate sources of resistance and poor understanding of resistance mechanisms, thus necessitating studies for generating resources to breed YSB resistant rice and to understand rice-YSB interaction. In this study, by using bulk-segregant analysis in combination with next-generation sequencing, Quantitative Trait Loci (QTL) intervals in five rice chromosomes were mapped that could be associated with YSB resistance at the vegetative phase in a resistant rice line named SM92. Further, multiple SNP markers that showed significant association with YSB resistance in rice chromosomes 1, 5, 10, and 12 were developed. RNA-sequencing of the susceptible and resistant lines revealed several genes present in the candidate QTL intervals to be differentially regulated upon YSB infestation. Comparative transcriptome analysis revealed a putative candidate gene that was predicted to encode an alpha-amylase inhibitor. Analysis of the transcriptome and metabolite profiles further revealed a possible link between phenylpropanoid metabolism and YSB resistance. Taken together, our study provides deeper insights into rice-YSB interaction and enhances the understanding of YSB resistance mechanism. Importantly, a promising breeding line and markers for YSB resistance have been developed that can potentially aid in marker-assisted breeding of YSB resistance among elite rice cultivars.


Chromosome Mapping , Moths , Oryza , Quantitative Trait Loci , Oryza/genetics , Oryza/parasitology , Oryza/immunology , Animals , Moths/physiology , Polymorphism, Single Nucleotide , Plant Diseases/parasitology , Plant Diseases/genetics , Plant Diseases/immunology , Disease Resistance/genetics , Genomics/methods , Phenotype , Multiomics
14.
PLoS One ; 19(5): e0302365, 2024.
Article En | MEDLINE | ID: mdl-38768140

In this study of evolutionary relationships in the subfamily Rubioideae (Rubiaceae), we take advantage of the off-target proportion of reads generated via previous target capture sequencing projects based on nuclear genomic data to build a plastome phylogeny and investigate cytonuclear discordance. The assembly of off-target reads resulted in a comprehensive plastome dataset and robust inference of phylogenetic relationships, where most intratribal and intertribal relationships are resolved with strong support. While the phylogenetic results were mostly in agreement with previous studies based on plastome data, novel relationships in the plastid perspective were also detected. For example, our analyses of plastome data provide strong support for the SCOUT clade and its sister relationship to the remaining members of the subfamily, which differs from previous results based on plastid data but agrees with recent results based on nuclear genomic data. However, several instances of highly supported cytonuclear discordance were identified across the Rubioideae phylogeny. Coalescent simulation analysis indicates that while ILS could, by itself, explain the majority of the discordant relationships, plastome introgression may be the better explanation in some cases. Our study further indicates that plastomes across the Rubioideae are, with few exceptions, highly conserved and mainly conform to the structure, gene content, and gene order present in the majority of the flowering plants.


Phylogeny , Plastids , Rubiaceae , Rubiaceae/genetics , Rubiaceae/classification , Plastids/genetics , Cell Nucleus/genetics , Genomics/methods , Genome, Plastid , Evolution, Molecular , Genome, Plant
15.
PLoS Biol ; 22(5): e3002632, 2024 May.
Article En | MEDLINE | ID: mdl-38768403

Reconstructing the tree of life remains a central goal in biology. Early methods, which relied on small numbers of morphological or genetic characters, often yielded conflicting evolutionary histories, undermining confidence in the results. Investigations based on phylogenomics, which use hundreds to thousands of loci for phylogenetic inquiry, have provided a clearer picture of life's history, but certain branches remain problematic. To resolve difficult nodes on the tree of life, 2 recent studies tested the utility of synteny, the conserved collinearity of orthologous genetic loci in 2 or more organisms, for phylogenetics. Synteny exhibits compelling phylogenomic potential while also raising new challenges. This Essay identifies and discusses specific opportunities and challenges that bear on the value of synteny data and other rare genomic changes for phylogenomic studies. Synteny-based analyses of highly contiguous genome assemblies mark a new chapter in the phylogenomic era and the quest to reconstruct the tree of life.


Genomics , Phylogeny , Synteny , Genomics/methods , Animals , Genome/genetics , Evolution, Molecular
16.
Molecules ; 29(9)2024 Apr 28.
Article En | MEDLINE | ID: mdl-38731531

Actinomycetes have long been recognized as an important source of antibacterial natural products. In recent years, actinomycetes in extreme environments have become one of the main research directions. Streptomyces sp. KN37 was isolated from the cold region of Kanas in Xinjiang. It demonstrated potent antimicrobial activity, but the primary active compounds remained unclear. Therefore, we aimed to combine genomics with traditional isolation methods to obtain bioactive compounds from the strain KN37. Whole-genome sequencing and KEGG enrichment analysis indicated that KN37 possesses the potential for synthesizing secondary metabolites, and 41 biosynthetic gene clusters were predicted, some of which showed high similarity to known gene clusters responsible for the biosynthesis of antimicrobial antibiotics. The traditional isolation methods and activity-guided fractionation were employed to isolate and purify seven compounds with strong bioactivity from the fermentation broth of the strain KN37. These compounds were identified as 4-(Diethylamino)salicylaldehyde (1), 4-Nitrosodiphenylamine (2), N-(2,4-Dimethylphenyl)formamide (3), 4-Nitrocatechol (4), Methylsuccinic acid (5), Phenyllactic acid (6) and 5,6-Dimethylbenzimidazole (7). Moreover, 4-(Diethylamino)salicylaldehyde exhibited the most potent inhibitory effect against Rhizoctonia solani, with an EC50 value of 14.487 mg/L, while 4-Nitrosodiphenylamine showed great antibacterial activity against Erwinia amylovora, with an EC50 value of 5.715 mg/L. This study successfully isolated several highly active antimicrobial compounds from the metabolites of the strain KN37, which could contribute as scaffolds for subsequent chemical synthesis. On the other hand, the newly predicted antibiotic-like substances have not yet been isolated, but they still hold significant research value. They are instructive in the study of active natural product biosynthetic pathways, activation of silent gene clusters, and engineering bacteria construction.


Genomics , Multigene Family , Streptomyces , Streptomyces/genetics , Streptomyces/metabolism , Streptomyces/chemistry , Genomics/methods , Anti-Bacterial Agents/pharmacology , Anti-Bacterial Agents/chemistry , Anti-Bacterial Agents/isolation & purification , Anti-Bacterial Agents/biosynthesis , Microbial Sensitivity Tests , Biological Products/pharmacology , Biological Products/chemistry , Biological Products/isolation & purification , Anti-Infective Agents/pharmacology , Anti-Infective Agents/chemistry , Anti-Infective Agents/isolation & purification , Agriculture/methods , Whole Genome Sequencing
17.
Sci Rep ; 14(1): 10803, 2024 05 11.
Article En | MEDLINE | ID: mdl-38734771

The northern giant hornet Vespa mandarinia (NGH) is a voracious predator of other insect species, including honey bees. NGH's native range spans subtropical and temperate regions across much of east and southeast Asia and, in 2019, exotic populations of the species were discovered in North America. Despite this broad range and invasive potential, investigation of the population genomic structure of NGH across its native and introduced ranges has thus far been limited to a small number of mitochondrial samples. Here, we present analyses of genomic data from NGH individuals collected across the species' native range and from exotic individuals collected in North America. We provide the first survey of whole-genome population variation for any hornet species, covering this species' native and invasive ranges, and in doing so confirm likely origins in Japan and South Korea for the two introductions. We additionally show that, while this introduced population exhibited strongly elevated levels of inbreeding, these signatures of inbreeding are also present in some long-standing native populations, which may indicate that inbreeding depression alone is insufficient to prevent the persistence of NGH populations. As well as highlighting the importance of ongoing monitoring and eradication efforts to limit the spread of this species outside of its natural range, our data will serve as a foundational database for future genomic studies into introduced hornet populations.


Introduced Species , Wasps , Animals , North America , Wasps/genetics , Genetics, Population , Genomics/methods , Genetic Variation , Inbreeding , Genome, Insect
18.
Theor Appl Genet ; 137(6): 138, 2024 May 21.
Article En | MEDLINE | ID: mdl-38771334

KEY MESSAGE: Residual neural network genomic selection is the first GS algorithm to reach 35 layers, and its prediction accuracy surpasses previous algorithms. With the decrease in DNA sequencing costs and the development of deep learning, phenotype prediction accuracy by genomic selection (GS) continues to improve. Residual networks, a widely validated deep learning technique, are introduced to deep learning for GS. Since each locus has a different weighted impact on the phenotype, strided convolutions are more suitable for GS problems than pooling layers. Through the above technological innovations, we propose a GS deep learning algorithm, residual neural network for genomic selection (ResGS). ResGS is the first neural network to reach 35 layers in GS. In 15 cases from four public data, the prediction accuracy of ResGS is higher than that of ridge-regression best linear unbiased prediction, support vector regression, random forest, gradient boosting regressor, and deep neural network genomic prediction in most cases. ResGS performs well in dealing with gene-environment interaction. Phenotypes from other environments are imported into ResGS along with genetic data. The prediction results are much better than just providing genetic data as input, which demonstrates the effectiveness of GS multi-modal learning. Standard deviation is recommended as an auxiliary GS evaluation metric, which could improve the distribution of predicted results. Deep learning for GS, such as ResGS, is becoming more accurate in phenotype prediction.


Algorithms , Genomics , Neural Networks, Computer , Phenotype , Genomics/methods , Models, Genetic , Deep Learning , Gene-Environment Interaction , Selection, Genetic
19.
Nat Genet ; 56(5): 758-766, 2024 May.
Article En | MEDLINE | ID: mdl-38741017

Human pluripotent stem (hPS) cells can, in theory, be differentiated into any cell type, making them a powerful in vitro model for human biology. Recent technological advances have facilitated large-scale hPS cell studies that allow investigation of the genetic regulation of molecular phenotypes and their contribution to high-order phenotypes such as human disease. Integrating hPS cells with single-cell sequencing makes identifying context-dependent genetic effects during cell development or upon experimental manipulation possible. Here we discuss how the intersection of stem cell biology, population genetics and cellular genomics can help resolve the functional consequences of human genetic variation. We examine the critical challenges of integrating these fields and approaches to scaling them cost-effectively and practically. We highlight two areas of human biology that can particularly benefit from population-scale hPS cell studies, elucidating mechanisms underlying complex disease risk loci and evaluating relationships between common genetic variation and pharmacotherapeutic phenotypes.


Genetics, Population , Genomics , Humans , Genomics/methods , Pluripotent Stem Cells , Genetic Variation , Phenotype , Single-Cell Analysis/methods , Disease/genetics
20.
Sci Rep ; 14(1): 11263, 2024 05 17.
Article En | MEDLINE | ID: mdl-38760420

Identifying cancer risk groups by multi-omics has attracted researchers in their quest to find biomarkers from diverse risk-related omics. Stratifying the patients into cancer risk groups using genomics is essential for clinicians for pre-prevention treatment to improve the survival time for patients and identify the appropriate therapy strategies. This study proposes a multi-omics framework that can extract the features from various omics simultaneously. The framework employs autoencoders to learn the non-linear representation of the data and applies tensor analysis for feature learning. Further, the clustering method is used to stratify the patients into multiple cancer risk groups. Several omics were included in the experiments, namely methylation, somatic copy-number variation (SCNV), micro RNA (miRNA) and RNA sequencing (RNAseq) from two cancer types, including Glioma and Breast Invasive Carcinoma from the TCGA dataset. The results of this study are promising, as evidenced by the survival analysis and classification models, which outperformed the state-of-the-art. The patients can be significantly (p-value<0.05) divided into risk groups using extracted latent variables from the fused multi-omics data. The pipeline is open source to help researchers and clinicians identify the patients' risk groups using genomics.


DNA Copy Number Variations , Genomics , Humans , Genomics/methods , DNA Methylation , Neoplasms/genetics , MicroRNAs/genetics , Female , Biomarkers, Tumor/genetics , Glioma/genetics , Glioma/pathology , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Multiomics
...