Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 82
Filter
1.
Phenomics ; 4(2): 109-124, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38884056

ABSTRACT

RNA sequencing (RNAseq) technology has become increasingly important in precision medicine and clinical diagnostics, and emerged as a powerful tool for identifying protein-coding genes, performing differential gene analysis, and inferring immune cell composition. Human peripheral blood samples are widely used for RNAseq, providing valuable insights into individual biomolecular information. Blood samples can be classified as whole blood (WB), plasma, serum, and remaining sediment samples, including plasma-free blood (PFB) and serum-free blood (SFB) samples that are generally considered less useful byproducts during the processes of plasma and serum separation, respectively. However, the feasibility of using PFB and SFB samples for transcriptome analysis remains unclear. In this study, we aimed to assess the suitability of employing PFB or SFB samples as an alternative RNA source in transcriptomic analysis. We performed a comparative analysis of WB, PFB, and SFB samples for different applications. Our results revealed that PFB samples exhibit greater similarity to WB samples than SFB samples in terms of protein-coding gene expression patterns, detection of differentially expressed genes, and immunological characterizations, suggesting that PFB can serve as a viable alternative to WB for transcriptomic analysis. Our study contributes to the optimization of blood sample utilization and the advancement of precision medicine research. Supplementary Information: The online version contains supplementary material available at 10.1007/s43657-023-00121-1.

2.
Nat Genet ; 56(5): 846-860, 2024 May.
Article in English | MEDLINE | ID: mdl-38641644

ABSTRACT

Methylation quantitative trait loci (mQTLs) are essential for understanding the role of DNA methylation changes in genetic predisposition, yet they have not been fully characterized in East Asians (EAs). Here we identified mQTLs in whole blood from 3,523 Chinese individuals and replicated them in additional 1,858 Chinese individuals from two cohorts. Over 9% of mQTLs displayed specificity to EAs, facilitating the fine-mapping of EA-specific genetic associations, as shown for variants associated with height. Trans-mQTL hotspots revealed biological pathways contributing to EA-specific genetic associations, including an ERG-mediated 233 trans-mCpG network, implicated in hematopoietic cell differentiation, which likely reflects binding efficiency modulation of the ERG protein complex. More than 90% of mQTLs were shared between different blood cell lineages, with a smaller fraction of lineage-specific mQTLs displaying preferential hypomethylation in the respective lineages. Our study provides new insights into the mQTL landscape across genetic ancestries and their downstream effects on cellular processes and diseases/traits.


Subject(s)
DNA Methylation , East Asian People , Quantitative Trait Loci , Female , Humans , Male , East Asian People/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Multifactorial Inheritance , Polymorphism, Single Nucleotide
3.
Nat Cancer ; 5(4): 673-690, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38347143

ABSTRACT

Molecular profiling guides precision treatment of breast cancer; however, Asian patients are underrepresented in publicly available large-scale studies. We established a comprehensive multiomics cohort of 773 Chinese patients with breast cancer and systematically analyzed their genomic, transcriptomic, proteomic, metabolomic, radiomic and digital pathology characteristics. Here we show that compared to breast cancers in white individuals, Asian individuals had more targetable AKT1 mutations. Integrated analysis revealed a higher proportion of HER2-enriched subtype and correspondingly more frequent ERBB2 amplification and higher HER2 protein abundance in the Chinese HR+HER2+ cohort, stressing anti-HER2 therapy for these individuals. Furthermore, comprehensive metabolomic and proteomic analyses revealed ferroptosis as a potential therapeutic target for basal-like tumors. The integration of clinical, transcriptomic, metabolomic, radiomic and pathological features allowed for efficient stratification of patients into groups with varying recurrence risks. Our study provides a public resource and new insights into the biology and ancestry specificity of breast cancer in the Asian population, offering potential for further precision treatment approaches.


Subject(s)
Asian People , Breast Neoplasms , Receptor, ErbB-2 , Humans , Breast Neoplasms/genetics , Breast Neoplasms/therapy , Female , Asian People/genetics , Receptor, ErbB-2/genetics , Mutation , Proteomics/methods , Gene Expression Profiling/methods , Proto-Oncogene Proteins c-akt/metabolism , Proto-Oncogene Proteins c-akt/genetics , Middle Aged , China/epidemiology , Ferroptosis/genetics , Adult , Metabolomics/methods , Transcriptome , Biomarkers, Tumor/genetics , East Asian People
4.
Adv Sci (Weinh) ; 11(15): e2305546, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38342612

ABSTRACT

The heterogeneity of triple-negative breast cancers (TNBC) remains challenging for various treatments. Ferroptosis, a recently identified form of cell death resulting from the unrestrained peroxidation of phospholipids, represents a potential vulnerability in TNBC. In this study, a high intensity focused ultrasound (HIFU)-driven nanomotor is developed for effective therapy of TNBC through induction of ferroptosis. Through bioinformatics analysis of typical ferroptosis-associated genes in the FUSCCTNBC dataset, gambogic acid is identified as a promising ferroptosis drug and loaded it into the nanomotor. It is found that the rapid motion of nanomotors propelled by HIFU significantly enhanced tumor accumulation and penetration. More importantly, HIFU not only actuated nanomotors to trigger effective ferroptosis of TNBC cells, but also drove nanomotors to activate ferroptosis-mediated antitumor immunity in primary and metastatic TNBC models, resulting in effective tumor regression and prevention of metastases. Overall, HIFU-driven nanomotors show great potential for ferroptosis-immunotherapy of TNBC.


Subject(s)
Ferroptosis , Triple Negative Breast Neoplasms , Humans , Triple Negative Breast Neoplasms/therapy , Immunotherapy , Cell Death , Computational Biology
5.
Genome Biol ; 25(1): 34, 2024 01 24.
Article in English | MEDLINE | ID: mdl-38268000

ABSTRACT

BACKGROUND: Various laboratory-developed metabolomic methods lead to big challenges in inter-laboratory comparability and effective integration of diverse datasets. RESULTS: As part of the Quartet Project, we establish a publicly available suite of four metabolite reference materials derived from B lymphoblastoid cell lines from a family of parents and monozygotic twin daughters. We generate comprehensive LC-MS-based metabolomic data from the Quartet reference materials using targeted and untargeted strategies in different laboratories. The Quartet multi-sample-based signal-to-noise ratio enables objective assessment of the reliability of intra-batch and cross-batch metabolomics profiling in detecting intrinsic biological differences among the four groups of samples. Significant variations in the reliability of the metabolomics profiling are identified across laboratories. Importantly, ratio-based metabolomics profiling, by scaling the absolute values of a study sample relative to those of a common reference sample, enables cross-laboratory quantitative data integration. Thus, we construct the ratio-based high-confidence reference datasets between two reference samples, providing "ground truth" for inter-laboratory accuracy assessment, which enables objective evaluation of quantitative metabolomics profiling using various instruments and protocols. CONCLUSIONS: Our study provides the community with rich resources and best practices for inter-laboratory proficiency tests and data integration, ensuring reliability of large-scale and longitudinal metabolomic studies.


Subject(s)
Liquid Chromatography-Mass Spectrometry , Metabolomics , Humans , Reproducibility of Results , Cell Line , Twins, Monozygotic
6.
Int J Parasitol Drugs Drug Resist ; 24: 100522, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38295619

ABSTRACT

Within the context of our anthelmintic discovery program, we recently identified and evaluated a quinoline derivative, called ABX464 or obefazimod, as a nematocidal candidate; synthesised a series of analogues which were assessed for activity against the free-living nematode Caenorhabditis elegans; and predicted compound-target relationships by thermal proteome profiling (TPP) and in silico docking. Here, we logically extended this work and critically evaluated the anthelmintic activity of ABX464 analogues on Haemonchus contortus (barber's pole worm) - a highly pathogenic nematode of ruminant livestock. First, we tested a series of 44 analogues on H. contortus (larvae and adults) to investigate the nematocidal pharmacophore of ABX464, and identified one compound with greater potency than the parent compound and showed moderate activity against a select number of other parasitic nematodes (including Ancylostoma, Heligmosomoides and Strongyloides species). Using TPP and in silico modelling studies, we predicted protein HCON_00074590 (a predicted aldo-keto reductase) as a target candidate for ABX464 in H. contortus. Future work aims to optimise this compound as a nematocidal candidate and investigate its pharmacokinetic properties. Overall, this study presents a first step toward the development of a new nematocide.


Subject(s)
Anthelmintics , Haemonchus , Nematoda , Quinolines , Animals , Antinematodal Agents/pharmacology , Anthelmintics/pharmacology , Structure-Activity Relationship , Caenorhabditis elegans , Quinolines/pharmacology
7.
Bioorg Med Chem ; 98: 117540, 2024 Jan 15.
Article in English | MEDLINE | ID: mdl-38134663

ABSTRACT

Global challenges with treatment failures and/or widespread resistance in parasitic worms against commercially available anthelmintics lend impetus to the development of new anthelmintics with novel mechanism(s) of action. The free-living nematode Caenorhabditis elegans is an important model organism used for drug discovery, including the screening and structure-activity investigation of new compounds, and target deconvolution. Previously, we conducted a whole-organism phenotypic screen of the 'Pandemic Response Box' (from Medicines for Malaria Venture, MMV) and identified a hit compound, called ABX464, with activity against C. elegans and a related, parasitic nematode, Haemonchus contortus. Here, we tested a series of 44 synthesized analogues to explore the pharmacophore of activity on C. elegans and revealed five compounds whose potency was similar or greater than that of ABX464, but which were not toxic to human hepatoma (HepG2) cells. Subsequently, we employed thermal proteome profiling (TPP), protein structure prediction and an in silico-docking algorithm to predict ABX464-target candidates. Taken together, the findings from this study contribute significantly to the early-stage drug discovery of a new nematocide based on ABX464. Future work is aimed at validating the ABX464-protein interactions identified here, and at assessing ABX464 and associated analogues against a panel of parasitic nematodes, towards developing a new anthelmintic with a mechanism of action that is distinct from any of the compounds currently-available commercially.


Subject(s)
Anthelmintics , Nematoda , Quinolines , Animals , Humans , Caenorhabditis elegans , Anthelmintics/pharmacology , Anthelmintics/chemistry , Structure-Activity Relationship
8.
Genome Biol ; 24(1): 277, 2023 Dec 04.
Article in English | MEDLINE | ID: mdl-38049885

ABSTRACT

BACKGROUND: Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology). RESULTS: The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study. CONCLUSIONS: In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.


Subject(s)
Benchmarking , East Asian People , Twins, Monozygotic , Humans , East Asian People/genetics , Genomics , Haplotypes , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Twins, Monozygotic/genetics , Twin Studies as Topic
9.
Genome Biol ; 24(1): 270, 2023 Nov 27.
Article in English | MEDLINE | ID: mdl-38012772

ABSTRACT

BACKGROUND: Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS: We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS: The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.


Subject(s)
Benchmarking , Genome, Human , Humans , Reproducibility of Results , Polymorphism, Single Nucleotide , Germ Cells , High-Throughput Nucleotide Sequencing/methods
10.
Genome Biol ; 24(1): 245, 2023 10 26.
Article in English | MEDLINE | ID: mdl-37884999

ABSTRACT

The Quartet Data Portal facilitates community access to well-characterized reference materials, reference datasets, and related resources established based on a family of four individuals with identical twins from the Quartet Project. Users can request DNA, RNA, protein, and metabolite reference materials, as well as datasets generated across omics, platforms, labs, protocols, and batches. Reproducible analysis tools allow for objective performance assessment of user-submitted data, while interactive visualization tools support rapid exploration of reference datasets. A closed-loop "distribution-collection-evaluation-integration" workflow enables updates and integration of community-contributed multiomics data. Ultimately, this portal helps promote the advancement of reference datasets and multiomics quality control.


Subject(s)
Multiomics , Software , Humans , Quality Control
12.
Nat Biotechnol ; 2023 Sep 07.
Article in English | MEDLINE | ID: mdl-37679545

ABSTRACT

Certified RNA reference materials are indispensable for assessing the reliability of RNA sequencing to detect intrinsically small biological differences in clinical settings, such as molecular subtyping of diseases. As part of the Quartet Project for quality control and data integration of multi-omics profiling, we established four RNA reference materials derived from immortalized B-lymphoblastoid cell lines from four members of a monozygotic twin family. Additionally, we constructed ratio-based transcriptome-wide reference datasets between two samples, providing cross-platform and cross-laboratory 'ground truth'. Investigation of the intrinsically subtle biological differences among the Quartet samples enables sensitive assessment of cross-batch integration of transcriptomic measurements at the ratio level. The Quartet RNA reference materials, combined with the ratio-based reference datasets, can serve as unique resources for assessing and improving the quality of transcriptomic data in clinical and biological settings.

13.
Nat Biotechnol ; 2023 Sep 07.
Article in English | MEDLINE | ID: mdl-37679543

ABSTRACT

Characterization and integration of the genome, epigenome, transcriptome, proteome and metabolome of different datasets is difficult owing to a lack of ground truth. Here we develop and characterize suites of publicly available multi-omics reference materials of matched DNA, RNA, protein and metabolites derived from immortalized cell lines from a family quartet of parents and monozygotic twin daughters. These references provide built-in truth defined by relationships among the family members and the information flow from DNA to RNA to protein. We demonstrate how using a ratio-based profiling approach that scales the absolute feature values of a study sample relative to those of a concurrently measured common reference sample produces reproducible and comparable data suitable for integration across batches, labs, platforms and omics types. Our study identifies reference-free 'absolute' feature quantification as the root cause of irreproducibility in multi-omics measurement and data integration and establishes the advantages of ratio-based multi-omics profiling with common reference materials.

14.
Genome Biol ; 24(1): 201, 2023 09 07.
Article in English | MEDLINE | ID: mdl-37674217

ABSTRACT

BACKGROUND: Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. RESULTS: As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. CONCLUSIONS: Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.


Subject(s)
Algorithms , Multiomics , Base Composition , Benchmarking , Clinical Relevance
15.
Genome Biol ; 24(1): 202, 2023 09 07.
Article in English | MEDLINE | ID: mdl-37674236

ABSTRACT

BACKGROUND: Quantitative proteomics is an indispensable tool in life science research. However, there is a lack of reference materials for evaluating the reproducibility of label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based measurements among different instruments and laboratories. RESULTS: Here, we develop the Quartet standard as a proteome reference material with built-in truths, and distribute the same aliquots to 15 laboratories with nine conventional LC-MS/MS platforms across six cities in China. Relative abundance of over 12,000 proteins on 816 mass spectrometry files are obtained and compared for reproducibility among the instruments and laboratories to ultimately generate proteomics benchmark datasets. There is a wide dynamic range of proteomes spanning about 7 orders of magnitude, and the injection order has marked effects on quantitative instead of qualitative characteristics. CONCLUSION: Overall, the Quartet offers valuable standard materials and data resources for improving the quality control of proteomic analyses as well as the reproducibility and reliability of research findings.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Chromatography, Liquid , Reproducibility of Results , Proteome
16.
Transl Oncol ; 37: 101759, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37579711

ABSTRACT

Glioma undergoes adaptive changes, leading to poor prognosis and resistance to treatment. CD99 influences the migration and invasion of glioma cells and plays an oncogene role. However, whether CD99 can affect the adaptiveness of gliomas is still lacking in research, making its clinical value underestimated. Here, we enrolled our in-house and public multiomics datasets for bioinformatic analysis and conducted immunohistochemistry staining to investigate the role of CD99 in glioma adaptive response and its clinical implications. CD99 is expressed in more adaptative glioma subtypes and cell states. Under hypoxic conditions, CD99 is upregulated in glioma cells and is associated with angiogenesis and metabolic adaptations. Gliomas with over-expressed CD99 also increased the immunosuppressive tumor-associated macrophages. The relevance with tumor adaptiveness of CD99 presented clinical significance. We discovered that CD99 overexpression is associated with short-time recurrence and validated its prognostic value. Additionally, Glioma patients with high expression of CD99 were resistant to chemotherapy and radiotherapy. The CD99 expression was also related to anti-angiogenic and immune checkpoint inhibitor therapy response. Inhibitors of the PI3K-AKT pathway have therapeutic potential against CD99-overexpressing gliomas. Our study identified CD99 as a biomarker characterizing the adaptive response in glioma. Gliomas with high CD99 expression are highly tolerant to stress conditions such as hypoxia and antitumor immunity, making treatment responses dimmer and tumor progression. Therefore, for patients with CD99-overexpressing gliomas, tumor adaptiveness should be fully considered during treatment to avoid drug resistance, and closer clinical monitoring should be carried out to improve the prognosis.

17.
Int J Mol Sci ; 24(15)2023 Aug 01.
Article in English | MEDLINE | ID: mdl-37569696

ABSTRACT

Biodiversity within the animal kingdom is associated with extensive molecular diversity. The expansion of genomic, transcriptomic and proteomic data sets for invertebrate groups and species with unique biological traits necessitates reliable in silico tools for the accurate identification and annotation of molecules and molecular groups. However, conventional tools are inadequate for lesser-known organismal groups, such as eukaryotic pathogens (parasites), so that improved approaches are urgently needed. Here, we established a combined sequence- and structure-based workflow system to harness well-curated publicly available data sets and resources to identify, classify and annotate proteases and protease inhibitors of a highly pathogenic parasitic roundworm (nematode) of global relevance, called Haemonchus contortus (barber's pole worm). This workflow performed markedly better than conventional, sequence-based classification and annotation alone and allowed the first genome-wide characterisation of protease and protease inhibitor genes and gene products in this worm. In total, we identified 790 genes encoding 860 proteases and protease inhibitors representing 83 gene families. The proteins inferred included 280 metallo-, 145 cysteine, 142 serine, 121 aspartic and 81 "mixed" proteases as well as 91 protease inhibitors, all of which had marked physicochemical diversity and inferred involvements in >400 biological processes or pathways. A detailed investigation revealed a remarkable expansion of some protease or inhibitor gene families, which are likely linked to parasitism (e.g., host-parasite interactions, immunomodulation and blood-feeding) and exhibit stage- or sex-specific transcription profiles. This investigation provides a solid foundation for detailed explorations of the structures and functions of proteases and protease inhibitors of H. contortus and related nematodes, and it could assist in the discovery of new drug or vaccine targets against infections or diseases.


Subject(s)
Haemonchus , Nematoda , Parasites , Animals , Male , Female , Haemonchus/genetics , Haemonchus/chemistry , Haemonchus/metabolism , Host-Parasite Interactions/genetics , Peptide Hydrolases/metabolism , Proteomics , Protease Inhibitors/pharmacology , Protease Inhibitors/metabolism , Endopeptidases/metabolism , Informatics
18.
EBioMedicine ; 94: 104728, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37506543

ABSTRACT

BACKGROUND: Ground-glass opacity (GGO)-like lung adenocarcinoma (LUAD) has been detected increasingly in the clinic and its inert property and superior survival indicate unique biological characteristics. However, we do not know much about them, which hampers identification of key reasons for the inert property of GGO-like LUAD. METHODS: Using whole-exome sequencing and RNA sequencing, taking into account both radiological and pathological classifications of the same 197 patients concomitantly, we systematically interrogate genes driving the progression from GGO to solid nodule and potential reasons for the inertia of GGO. Using flow cytometry and IHC, we validated the abundance of immune cells and activity of cell proliferation. FINDINGS: Identifying the differences between GGO and solid nodule, we found adenocarcinoma in situ/minimally invasive adenocarcinoma (AIS/MIA) and GGO-like LUAD exhibited lower TP53 mutation frequency and less active cell proliferation-related pathways than solid nodule in LUAD. Identifying the differences in GGO between AIS/MIA and LUAD, we noticed that EGFR mutation frequency and CNV load were significantly higher in LUAD than in AIS/MIA. Regulatory T cell was also higher in LUAD, while CD8+ T cell decreased from AIS/MIA to LUAD. Finally, we constructed a transcriptomic signature to quantify the development from GGO to solid nodule, which was an independent predictor of patients' prognosis in 11 external LUAD datasets. INTERPRETATION: Our results provide deeper insights into the indolent nature of GGO and provide a molecular basis for the treatment of GGO-like LUAD. FUNDING: This study was supported in part by the National Natural Science Foundation of China (32170657), the National Natural Science Foundation of China (82203037), and Shanghai Sailing Program (22YF1408900).

19.
Comput Struct Biotechnol J ; 21: 2696-2704, 2023.
Article in English | MEDLINE | ID: mdl-37143762

ABSTRACT

Major advances in genomic and associated technologies have demanded reliable bioinformatic tools and workflows for the annotation of genes and their products via comparative analyses using well-curated reference data sets, accessible in public repositories. However, the accurate in silico annotation of molecules (proteins) encoded in organisms (e.g., multicellular parasites) which are evolutionarily distant from those for which these extensive reference data sets are available, including invertebrate model organisms (e.g., Caenorhabditis elegans - free-living nematode, and Drosophila melanogaster - the vinegar fly) and vertebrate species (e.g., Homo sapiens and Mus musculus), remains a major challenge. Here, we constructed an informatic workflow for the enhanced annotation of biologically-important, excretory/secretory (ES) proteins ("secretome") encoded in the genome of a parasitic roundworm, called Haemonchus contortus (commonly known as the barber's pole worm). We critically evaluated the performance of five distinct methods, refined some of them, and then combined the use of all five methods to comprehensively annotate ES proteins, according to gene ontology, biological pathways and/or metabolic (enzymatic) processes. Then, using optimised parameter settings, we applied this workflow to comprehensively annotate 2591 of all 3353 proteins (77.3%) in the secretome of H. contortus. This result is a substantial improvement (10-25%) over previous annotations using individual, "off-the-shelf" algorithms and default settings, indicating the ready applicability of the present, refined workflow to gene/protein sequence data sets from a wide range of organisms in the Tree-of-Life.

20.
Front Genet ; 14: 1107353, 2023.
Article in English | MEDLINE | ID: mdl-36968580

ABSTRACT

Sericinus montelus (Lepidoptera, Papilionidae, Parnassiinae) is a high-value ornamental swallowtail butterfly species widely distributed in Northern and Central China, Japan, Korea, and Russia. The larval stage of this species feeds exclusively on Aristolochia plants. The Aristolochia species is well known for its high levels of aristolochic acids (AAs), which have been found to be carcinogenic for numerous animals. The swallowtail butterfly is among the few that can feed on these toxic host plants. However, the genetic adaptation of S. montelus to confer new abilities for AA tolerance has not yet been well explored, largely due to the limited genomic resources of this species. This study aimed to present a chromosome-level reference genome for S. montelus using the Oxford Nanopore long-read sequencing, Illumina short-read sequencing, and Hi-C technology. The final assembly was composed of 581.44 Mb with an expected genome size of 619.27 Mb. Further, 99.98% of the bases could be anchored onto 30 chromosomes. The N50 of contigs and scaffolds was 5.74 and 19.12 Mb, respectively. Approximately 48.86% of the assembled genome was suggested to be repeat elements, and 13,720 protein-coding genes were predicted in the current assembly. The phylogenetic analysis indicated that S. montelus diverged from the common ancestor of swallowtails about 58.57-80.46 million years ago. Compared with related species, S. montelus showed a significant expansion of P450 gene family members, and positive selections on eloa, heatr1, and aph1a resulted in the AA tolerance for S. montelus larva. The de novo assembly of a high-quality reference genome for S. montelus provided a fundamental genomic tool for future research on evolution, genome genetics, and toxicology of the swallowtail butterflies.

SELECTION OF CITATIONS
SEARCH DETAIL
...