Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 34
1.
Mol Genet Genomics ; 299(1): 37, 2024 Mar 18.
Article En | MEDLINE | ID: mdl-38494535

Identity by descent (IBD) segments, uninterrupted DNA segments derived from the same ancestral chromosomes, are widely used as indicators of relationships in genetics. A great deal of research focuses on IBD segments between related pairs, while the statistical analyses of segments in irrelevant individuals are rare. In this study, we investigated the basic informative features of IBD segments in unrelated pairs in Chinese populations from the 1000 Genome Project. A total of 5922 IBD segments in Chinese interpopulation unrelated individual pairs were detected via IBIS and the average length of IBD was 3.71 Mb in length. It was found that 17.86% of unrelated pairs shared at least one IBD segment in the Chinese cohort. Furthermore, a total of 49 chromosomal regions where IBD segments clustered in high abundance were identified, which might be sharing hotspots in the human genome. Such regions could also be observed in other ancestry populations, which implies that similar IBD backgrounds also exist. Altogether, these results demonstrated the distribution of common background IBD segments, which helps improve the accuracy in pedigree studies based on IBD analysis.


Asian People , Genome, Human , Humans , Asian People/genetics , Genome, Human/genetics , Pedigree , Research Design , China
2.
ArXiv ; 2024 Feb 15.
Article En | MEDLINE | ID: mdl-38410647

Effective DNA embedding remains crucial in genomic analysis, particularly in scenarios lacking labeled data for model fine-tuning, despite the significant advancements in genome foundation models. A prime example is metagenomics binning, a critical process in microbiome research that aims to group DNA sequences by their species from a complex mixture of DNA sequences derived from potentially thousands of distinct, often uncharacterized species. To fill the lack of effective DNA embedding models, we introduce DNABERT-S, a genome foundation model that specializes in creating species-aware DNA embeddings. To encourage effective embeddings to error-prone long-read DNA sequences, we introduce Manifold Instance Mixup (MI-Mix), a contrastive objective that mixes the hidden representations of DNA sequences at randomly selected layers and trains the model to recognize and differentiate these mixed proportions at the output layer. We further enhance it with the proposed Curriculum Contrastive Learning (C2LR) strategy. Empirical results on 18 diverse datasets showed DNABERT-S's remarkable performance. It outperforms the top baseline's performance in 10-shot species classification with just a 2-shot training while doubling the Adjusted Rand Index (ARI) in species clustering and substantially increasing the number of correctly identified species in metagenomics binning. The code, data, and pre-trained model are publicly available at https://github.com/Zhihan1996/DNABERT_S.

3.
Electrophoresis ; 45(5-6): 505-516, 2024 Mar.
Article En | MEDLINE | ID: mdl-38037287

Insertion/deletion polymorphisms (InDels) are a category of highly prevalent markers in the human genome, characterized by their distinctive attributes, including short amplicon sizes and low mutation rates, which have shown great potential in forensic applications. Multi-allelic InDel and multi-InDel markers, collectively abbreviated as MM-InDels, were developed to enhance polymorphism by the introduction of novel alleles. Nevertheless, the relatively low mutation rates of InDels, coupled with the founder effect, result in distinct allele frequency distributions among populations. The divergent characteristics of InDels in different populations also pose challenges to the establishment of universally efficient InDel multiplex assays. To enhance the system efficiency of the InDel assay and its applicability across diverse populations, 39 MM-InDels with high polymorphism in five different ancestry superpopulations were selected from the 1000 Genomes Project dataset and combined with an amelogenin gender marker to construct a multiplex assay (named MMIDplex). The combined power of discrimination and the cumulative probability of exclusion of 39 MM-InDels were 1 - 1.3 × 10-23 and 1 - 9.83 × 10-6 in the Chinese Han population, and larger than 1-10-19 and 1-10-4 in the reference populations, relatively. These results demonstrate that the MMIDplex assay has the potential to obtain sufficient power for individual identification and paternity test in global populations.


Forensic Genetics , Polymorphism, Genetic , Humans , Forensic Genetics/methods , Gene Frequency/genetics , Asian People , INDEL Mutation , Genetics, Population , China
4.
Front Genet ; 14: 1182028, 2023.
Article En | MEDLINE | ID: mdl-37205119

The Y-chromosomal haplogroup tree, which consists of a group of Y-chromosomal loci with phylogenetic information, has been widely applied in anthropology, archaeology and population genetics. With the continuous updating of the phylogenetic structure, Y-chromosomal haplogroup tree provides more information for recalling the biogeographical origin of Y chromosomes. Generally, Y-chromosomal insertion-deletion polymorphisms (Y-InDels) are genetically stable as Y-chromosomal single nucleotide polymorphisms (Y-SNPs), and therefore carry mutations that can accumulate over generations. In this study, potential phylogenetic informative Y-InDels were filtered out in haplogroup O-M175, which is dominant in East Asia, based on population data retrieved from the 1000 Genomes Project. A group of 22 phylogenetic informative Y-InDels were identified and then assigned to their corresponding subclades of haplogroup O-M175, which provided a supplement for the update and application of Y-chromosomal markers. Especially, four Y-InDels were introduced to define subclades determined using a single Y-SNP.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9964-9980, 2023 Aug.
Article En | MEDLINE | ID: mdl-37027688

Learning with noisy labels has become imperative in the Big Data era, which saves expensive human labors on accurate annotations. Previous noise-transition-based methods have achieved theoretically-grounded performance under the Class-Conditional Noise model (CCN). However, these approaches builds upon an ideal but impractical anchor set available to pre-estimate the noise transition. Even though subsequent works adapt the estimation as a neural layer, the ill-posed stochastic learning of its parameters in back-propagation easily falls into undesired local minimums. We solve this problem by introducing a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework. By projecting the noise transition into the Dirichlet space, the learning is constrained on a simplex characterized by the complete dataset, instead of some ad-hoc parametric space wrapped by the neural layer. We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels to train the classifier and to model the noise. Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples. We further generalize LCCN to different counterparts compatible with open-set noisy labels, semi-supervised learning as well as cross-model training. A range of experiments demonstrate the advantages of LCCN and its variants over the current state-of-the-art methods. The code is available at here.


Algorithms , Big Data , Humans , Bayes Theorem , Supervised Machine Learning
7.
Mol Neurobiol ; 60(6): 3345-3364, 2023 Jun.
Article En | MEDLINE | ID: mdl-36853430

Defective autophagy relates to the pathogenesis of Parkinson's disease (PD), a typical neurodegenerative disease. Our recent study has demonstrated that PD toxins (6-OHDA, MPP+, or rotenone) induce neuronal apoptosis by impeding the AMPK/Akt-mTOR signaling. Here, we show that treatment with 6-OHDA, MPP+, or rotenone triggered decreases of ATG5/LC3-II and autophagosome formation with a concomitant increase of p62 in PC12, SH-SY5Y cells, and primary neurons, suggesting inhibition of autophagy. Interestingly, overexpression of wild-type ATG5 attenuated the inhibitory effect of PD toxins on autophagy, reducing neuronal apoptosis. The effects of PD toxins on autophagy and apoptosis were found to be associated with activation of PTEN and inactivation of Akt. Overexpression of dominant negative PTEN, constitutively active Akt and/or pretreatment with rapamycin rescued the cells from PD toxins-induced downregulation of ATG5/LC3-II and upregulation of p62, as well as consequential autophagosome diminishment and apoptosis in the cells. The effects of PD toxins on autophagy and apoptosis linked to excessive intracellular and mitochondrial hydrogen peroxide (H2O2) production, as evidenced by using a H2O2-scavenging enzyme catalase, a mitochondrial superoxide indicator MitoSOX and a mitochondria-selective superoxide scavenger Mito-TEMPO. Furthermore, we observed that treatment with PD toxins reduced the protein level of Parkin in the cells. Knockdown of Parkin alleviated the effects of PD toxins on H2O2 production, PTEN/Akt activity, autophagy, and apoptosis in the cells, whereas overexpression of wild-type Parkin exacerbated these effects of PD toxins, implying the involvement of Parkin in the PD toxins-induced oxidative stress. Taken together, the results indicate that PD toxins can elicit mitochondrial H2O2, which can activate PTEN and inactivate Akt leading to autophagy inhibition-dependent neuronal apoptosis, and Parkin plays a critical role in this process. Our findings suggest that co-manipulation of the PTEN/Akt/autophagy signaling by antioxidants may be exploited for the prevention of neuronal loss in PD.


Neuroblastoma , Neurodegenerative Diseases , Parkinson Disease , Humans , Parkinson Disease/pathology , Hydrogen Peroxide/metabolism , Proto-Oncogene Proteins c-akt/metabolism , TOR Serine-Threonine Kinases/metabolism , Rotenone/pharmacology , Rotenone/metabolism , Superoxides/metabolism , Neurodegenerative Diseases/metabolism , Oxidopamine/pharmacology , Neuroblastoma/pathology , Neurons/metabolism , Apoptosis , Autophagy , Mitochondria/metabolism , PTEN Phosphohydrolase/metabolism
8.
Environ Sci Pollut Res Int ; 30(12): 34158-34173, 2023 Mar.
Article En | MEDLINE | ID: mdl-36508098

This paper investigates the long-run effects of PM2.5 exposure in utero on the mental health of adolescents. Using nationally representative survey data from China, we instrument the PM2.5 exposure with wind speed to tackle the possible endogeneity problem. Our results show that mothers' PM2.5 exposure during their pregnancy negatively affects the mental health of their children aged between 10 and 15 years. A 1 µg/m3 increase in PM2.5 exposure in utero increases the probability of having a severe mental illness for adolescents by 0.6%. Our evidence supports the "fetal origins" hypothesis. We also find that fetal PM2.5 exposure leads adolescents to be more likely to be absent from school and quarrel with their parents, implying that fetal PM2.5 exposure may affect individuals' behavior when they grow up.


Air Pollutants , Air Pollution , Child , Pregnancy , Adolescent , Female , Humans , Air Pollutants/analysis , Particulate Matter/analysis , Air Pollution/analysis , Mental Health , China , Environmental Exposure
9.
Int J Legal Med ; 137(1): 1-12, 2023 Jan.
Article En | MEDLINE | ID: mdl-36326889

Insertion/Deletion (InDel) polymorphic genetic markers are abundant in human genomes. Diallelic InDel markers have been widely studied for forensic purposes, yet the low polymorphic information content limits their application and current InDel panels remain to be improved. In this study, multi-allelic InDels located out of low complexity sequence regions were selected in the datasets from East Asian populations, and a multiplex amplification system containing 31 multi-allelic InDel markers and the Amelogenin marker (FA-HID32plex) was constructed and optimized. The preliminary study on sensitivity, species specificity, inhibitor tolerance, mixture resolution, and the detection of degraded samples demonstrates that the FA-HID32plex is highly sensitive, specific, and robust for traces and degraded samples. The combined power of discrimination (CPD) of 31 multi-allelic InDel markers was 0.999 999 999 999 999 999 85, and the cumulative probability of exclusion (CPE) was 0.999 920 in a Chinese Han population, which indicates a high discrimination power. Altogether, the FA-HID32plex panel could provide reliable supplements or stand-alone information in individual identification and paternity testing, especially for challenging samples.


DNA Fingerprinting , Forensic Genetics , Humans , Asian People/genetics , Paternity , INDEL Mutation , Genetics, Population , Gene Frequency
10.
Genes (Basel) ; 13(8)2022 08 04.
Article En | MEDLINE | ID: mdl-36011297

Obtaining a full short tandem repeat (STR) profile from a low template DNA (LT-DNA) still presents a challenge for conventional methods due to significant stochastic effects and polymerase slippage. A novel amplification method with a lower cost and higher accuracy is required to improve the DNA amount. Previous studies suggested that DNA polymerases without bypass activity could not perform processive DNA synthesis beyond abasic sites in vitro and our results showed a lack of bypass activity for Phusion, Pfu and KAPA DNA polymerases in this study. Based on this feature, we developed a novel linear amplification method, termed Linear Aamplification for double-stranded DNA using primers with abasic sites near 3' end (abLAFD), to limit the replication error. The amplification efficiency was evaluated by qPCR analysis with a result of approximately a 130-fold increase in target DNA. In a LT-DNA analysis, the abLAFD method can be employed as a pre-PCR. Similar to nested PCRs, primer sets used for the abLAFD method were designed as external primers suitable for commercial multiplex STR amplification assays. The practical performance of the abLAFD method was evaluated by coupling it to a routine PP21 STR analysis using 50 pg and 25 pg DNA. Compared to reference profiles, all abLAFD profiles showed significantly recovered alleles, increased average peak height and heterozygote balance with a comparable stutter ratio. Altogether, our results support the theory that the abLAFD method is a promising strategy coupled to STR typing for forensic LT-DNA analysis.


DNA , Alleles , DNA/analysis , DNA/genetics , Heterozygote , Polymerase Chain Reaction/methods
11.
Biochem Pharmacol ; 202: 115139, 2022 08.
Article En | MEDLINE | ID: mdl-35697119

Therapeutically targeting B cells has received great attention in the treatment of B-cell malignancies and autoimmune diseases. The B-cell activating factor (BAFF) is critical to the survival of normal and neoplastic B cells, and excess production of BAFF contributes to autoimmune diseases. Resveratrol, a natural polyphenolic compound, has a positive effect on the treatment of autoimmune diseases. However, how resveratrol affects BAFF-stimulated B-cell proliferation and survival is poorly understood. Here, we show that resveratrol increased autophagosome formation and ATG5/LC3-II levels and decreased p62 level, promoting autophagic flux/autophagy and thereby suppressing the basal or human soluble BAFF (hsBAFF)-stimulated proliferation and survival of normal and B-lymphoid (Raji) cells. This is supported by the findings that inhibition of autophagy with 3-methyladenine (3-MA, an inhibitor of Vps34) or ATG5 shRNA attenuates resveratrol-induced autophagy and -reduced proliferation/viability in B-cells. Inhibition of mTOR with rapamycin or knockdown of mTOR potentiated resveratrol-induced autophagy and inhibition of hsBAFF-stimulated B-cell proliferation/viability, while overexpression of wild-type mTOR conferred resistance to the actions of resveratrol. Similarly, inhibition of Akt with Akt inhibitor X or ectopic expression of dominant negative Akt reinforced resveratrol-induced autophagy and inhibition of hsBAFF-stimulated B-cell proliferation/viability, whereas expression of constitutively active Akt conferred resistance to the actions of resveratrol. Taken together, these results indicate that resveratrol induces autophagy impeding BAFF-stimulated proliferation and survival via blocking the Akt/mTOR signaling pathway in normal and neoplastic B cells. Our findings highlight that resveratrol has a great potential for prevention and treatment of excessive BAFF-elicited aggressive B-cell disorders and autoimmune diseases.


Autoimmune Diseases , B-Cell Activating Factor , Apoptosis , Autophagy , B-Cell Activating Factor/genetics , B-Cell Activating Factor/metabolism , B-Cell Activating Factor/pharmacology , Cell Proliferation , Cell Survival , Humans , Proto-Oncogene Proteins c-akt/metabolism , Resveratrol/pharmacology , TOR Serine-Threonine Kinases/metabolism
12.
Forensic Sci Int ; 334: 111270, 2022 May.
Article En | MEDLINE | ID: mdl-35306348

Y chromosome has an important role in the forensic practice due to its unique paternal inheritance pattern. Y-chromosomal single nucleotide polymorphisms (Y-SNPs) could provide supplementary information while the application of Y-chromosomal STR (Y-STR) haplotypes encounter their limitations. Y-SNPs with recurrent mutation can be seen in different Y-chromosomal haplogroups, which might help discriminate different paternal pedigrees. In this study, a host of candidate Y-SNPs with recurrent mutation were obtained based on population data from 1000 Genome Project. Further, 8 Y-SNPs from a small part of candidates were confirmed to be polymorphic in 2 or more Y-chromosomal haplogroups (sub-haplogroups) in the Chinese Han population. With a haplotype diversity value of 0.9367, the investigated subset of Y-SNPs with recurrent mutation shows a high discrimination power. Therefore, Y-SNPs with recurrent mutation should function as useful markers to provide information in the forensic applications.


Chromosomes, Human, Y , Polymorphism, Single Nucleotide , Genetics, Population , Haplotypes , Humans , Microsatellite Repeats , Mutation
13.
Gigascience ; 122022 12 28.
Article En | MEDLINE | ID: mdl-37602759

BACKGROUND: Computational drug repurposing is a cost- and time-efficient approach that aims to identify new therapeutic targets or diseases (indications) of existing drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action (MOAs) between repurposed drugs and their target diseases remain largely unknown, which is still a main obstacle for computational drug repurposing methods to be widely adopted in clinical settings. RESULTS: In this work, we propose KGML-xDTD: a Knowledge Graph-based Machine Learning framework for explainably predicting Drugs Treating Diseases. It is a 2-module framework that not only predicts the treatment probabilities between drugs/compounds and diseases but also biologically explains them via knowledge graph (KG) path-based, testable MOAs. We leverage knowledge-and-publication-based information to extract biologically meaningful "demonstration paths" as the intermediate guidance in the Graph-based Reinforcement Learning (GRL) path-finding process. Comprehensive experiments and case study analyses show that the proposed framework can achieve state-of-the-art performance in both predictions of drug repurposing and recapitulation of human-curated drug MOA paths. CONCLUSIONS: KGML-xDTD is the first model framework that can offer KG path explanations for drug repurposing predictions by leveraging the combination of prediction outcomes and existing biological knowledge and publications. We believe it can effectively reduce "black-box" concerns and increase prediction confidence for drug repurposing based on predicted path-based explanations and further accelerate the process of drug discovery for emerging diseases.


Drug Discovery , Pattern Recognition, Automated , Humans , Knowledge , Machine Learning , Probability
14.
Front Genet ; 12: 784605, 2021.
Article En | MEDLINE | ID: mdl-34868274

The application of X-chromosomal short tandem repeats (X-STRs) has been recognized as a powerful tool in complex kinship testing. To support further development of X-STR analysis in forensic use, we identified nine novel X-STRs, which could be clustered into three linkage groups on Xp21.1, Xq21.31, and Xq23. A multiplex PCR system was built based on the electrophoresis. A total of 198 unrelated Shanghai Han samples along with 168 samples from 43 families was collected to investigate the genetic polymorphism and forensic parameters of the nine loci. Allele numbers ranged from 5 to 12, and amplicon sizes ranged from 146 to 477 bp. The multiplex showed high values for the combined power of discrimination (0.99997977 in males and 0.99999999 in females) and combined mean exclusion chances (0.99997918 and 0.99997821 in trios, 0.99984939 in duos, and 0.99984200 in deficiency cases). The linkage between all pairs of loci was estimated via Kosambi mapping function and linkage disequilibrium test, and further investigated through the family study. The data from 43 families strongly demonstrated an independent transmission between LGs and a tight linkage among loci within the same LG. All these results support that the newly described X-STRs and the multiplex system are highly promising for further forensic use.

15.
Int Immunopharmacol ; 96: 107771, 2021 Jul.
Article En | MEDLINE | ID: mdl-34004440

B-cell activating factor (BAFF) is an essential cytokine for B-cell maturation, differentiation and survival, and excess BAFF induces aggressive or neoplastic B-cell disorders and contributes to development of autoimmune diseases. Metformin, an anti-diabetic drug, has recently garnered a great attention due to its anti-proliferative and immune-modulatory features. However, little is known regarding the effect of metformin on BAFF-stimulated B cells. Here, we show that metformin attenuated human soluble BAFF (hsBAFF)-induced cell proliferation and survival by blocking the Erk1/2 pathway in normal and B-lymphoid (Raji) cells. Pretreatment with U0126, knockdown of Erk1/2, or expression of dominant negative MKK1 strengthened metformin's inhibition of hsBAFF-activated Erk1/2 and B-cell proliferation/viability, whereas expression of constitutively active MKK1 rendered high resistance to metformin. Further investigation found that overexpression of wild type PTEN or ectopic expression of dominant negative Akt potentiated metformin's suppression of hsBAFF-induced Erk1/2 activation and proliferation/viability in Raji cells, implying a PTEN/Akt-dependent mechanism involved. Furthermore, we noticed that metformin hindered hsBAFF-activated mTOR pathway in B cells. Inhibition of mTOR with rapamycin or knockdown of mTOR enhanced metformin's suppression of hsBAFF-induced phosphorylation of S6K1, PTEN, Akt, and Erk1/2, as well as B-cell proliferation/viability. These results indicate that metformin prevents BAFF activation of Erk1/2 from cell proliferation and survival by impeding mTOR-PTEN/Akt signaling pathway in normal and neoplastic B-lymphoid cells. Our findings support that metformin has a great potential for prevention of excessive BAFF-induced aggressive B-cell malignancies and autoimmune diseases.


B-Cell Activating Factor/metabolism , B-Lymphocytes/drug effects , Metformin/pharmacology , Mitogen-Activated Protein Kinase 1/antagonists & inhibitors , Mitogen-Activated Protein Kinase 3/antagonists & inhibitors , Animals , B-Cell Activating Factor/genetics , B-Lymphocytes/cytology , B-Lymphocytes/immunology , B-Lymphocytes/metabolism , Cell Line, Tumor , Cell Proliferation/physiology , Cell Survival/physiology , Humans , Hypoglycemic Agents/pharmacology , Lymphocyte Activation/drug effects , Mice , PTEN Phosphohydrolase/antagonists & inhibitors , Primary Cell Culture , Proto-Oncogene Proteins c-akt/antagonists & inhibitors , Signal Transduction , TOR Serine-Threonine Kinases/antagonists & inhibitors
16.
Int J Legal Med ; 135(5): 1727-1735, 2021 Sep.
Article En | MEDLINE | ID: mdl-33666691

The discrimination of body fluid stains provides crucial evidence during the investigation of criminal cases. Previous studies have demonstrated the practical value of mRNA profiling in body fluid identification. Conventional strategy of mRNA profiling entails reverse transcription and PCR amplification in two separate procedures with different buffer systems. In this study, we subjected the one-step multiplex reverse transcription PCR strategy to mRNA profiling with the inclusion of the same 18 tissue-specific biomarkers in the F18plex system targeting peripheral blood, menstrual blood, vaginal secretion, saliva, semen, and urine. The Qiagen OneStep RT-PCR kit and Titanium One-Step RT-PCR kit were applied to multiplex construction, while reproducible profiling results were obtained with both kits. Compared to the F18plex system, similar expression profiles of biomarkers were obtained in targeted tissues, while expected cross-reaction was observed in non-targeted body fluids. However, CYP2B7P1 and SPINK5 were detected in menstrual blood samples, which was not observed using the F18plex system. Full-profiling results were obtained in all samples using 0.1 ng peripheral blood and semen RNA, and 1 ng menstrual blood, vaginal secretion, saliva, and urine RNA. In conclusion, the application of one-step mRNA profiling strategy could be a reliable and economical method for the simplified, specific, and simultaneous analysis of tissue-specific biomarkers for the discrimination of body fluid origin.


Body Fluids/chemistry , Gene Expression Profiling , Multiplex Polymerase Chain Reaction/methods , RNA, Messenger/analysis , Reverse Transcriptase Polymerase Chain Reaction/methods , Biomarkers/chemistry , Female , Humans , Male
17.
Bioinformatics ; 37(15): 2112-2120, 2021 Aug 09.
Article En | MEDLINE | ID: mdl-33538820

MOTIVATION: Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. RESULTS: To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. AVAILABILITY AND IMPLEMENTATION: The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

18.
Front Genet ; 12: 809815, 2021.
Article En | MEDLINE | ID: mdl-35178073

Multiple mutational events of insertion/deletion occurring at or around InDel sites could form multi-allelic InDels and multi-InDels (abbreviated as MM-InDels), while InDels with random DNA sequences could imply a unique mutation event at these loci. In this study, preliminary investigation of MM-InDels with random sequences was conducted using high-throughput phased data from the 1000 Genomes Project. A total of 3,599 multi-allelic InDels and 6,375 multi-InDels were filtered with multiple alleles. A vast majority of the obtained MM-InDels (85.59%) presented 3 alleles, which implies that only one secondary insertion or deletion mutation event occurred at these loci. The more frequent presence of two adjacent InDel loci was observed within 20 bp. MM-InDels with random sequences presented an uneven distribution across the genome and showed a correlation with InDels, SNPs, recombination rate, and GC content. The average allelic frequencies and prevalence of multi-allelic InDels and multi-InDels presented similar distribution patterns in different populations. Altogether, MM-InDels with random sequences can provide useful information for population resolution.

19.
Front Public Health ; 9: 760792, 2021.
Article En | MEDLINE | ID: mdl-34988048

Objective: China and many developing countries has placed high expectations on the general practice healthcare system in terms of lowering medical costs and improving the health status of the multimorbid population in recent years. However, the prevalence of multimorbidity among inpatients attending the general practice department of hospitals and its policy implications are largely unknown. The current study aimed to analyze the prevalence of comorbidities among inpatients attending the general practice department of the tertiary Grade-A Hospitals in China, and put forward evidence-based policy recommendations. Methods: Between December 2016 and November 2020, 351 registered general practitioners from 27 tertiary hospitals were selected, and their direct admissions were evaluated. The rate and composition ratio were used for descriptive analysis of the clinical and epidemiological characteristics of multimorbidity. A backward stepwise algorithm was used to explore independent variables. The absence of multicollinearity and plausible interactions among variables were tested to ensure the robustness of the logistic regression model. The pyramid diagram was used to show the link between gender and the involved human body system in multimorbidity. Results: Multimorbidity was present in 93.1% of the 64, 395 patients who were admitted directly. Multimorbidity was significantly more prevalent in patients aged 45-59 years (OR=3.018, 95% CI=1.945-4.683), 60-74 years (OR = 4.349, 95% CI = 2.574-7.349), ≥75 years (OR = 7.804, 95% CI = 3.665-16.616), and those with body mass index (BMI) ≥ 28 kg/m2 (OR = 3.770, 95% CI = 1.453-9.785). The circulatory system was found to be the most commonly involved human body system in multimorbidity, accounting for 79.2% (95% CI = 78.8-79.5%) of all cases. Significant gender inequity was further observed in the involved human body system in multimorbidity. Conclusion: Multimorbidity is likely common among the inpatients attending the general practice department of hospitals in China and many developing countries, with significant gender inequity in the involved human body systems. Effective countermeasures include establishing a GP-PCIC multimorbidity prevention and control model and enhancing the multimorbidity of elderly and obese patients at both the clinical and healthy lifestyle levels. The diagnosis and treatment capabilities of GPs on the circulatory, endocrine, metabolic, digestive, and respiratory systems should be prioritized.


General Practice , Multimorbidity , Aged , China/epidemiology , Delivery of Health Care , Humans , Longitudinal Studies
20.
Forensic Sci Int Genet ; 47: 102312, 2020 07.
Article En | MEDLINE | ID: mdl-32480323

Currently, mRNA profiling is widely investigated for forensic body fluid identification, while it is still required to advance the approach for those casework samples of limited quantity or low quality. The inclusion of circular RNAs (circRNAs) can facilitate the detection of mRNA markers in forensic body fluid identification. In this study, a multiplex assay for forensic body fluid identification (F18plex assay) was developed by incorporating 14 tissue-specific mRNA markers with circRNAs expression, 2 mRNA markers with high abundance and 2 housekeeping markers for the discrimination of the most common forensic body fluids, including blood, menstrual blood, saliva, vaginal secretion, semen and urine. The markers employed in the F18plex assay show similar specificity to previous reports. Additionally, even if all linear transcripts were completely erased, the expected markers in target biofluids could still be identified, which should help the discrimination of those aged biological stains. Results from sensitivity testing and the detection of mixtures demonstrate good sensitivity of the multiplex assay. Generally, full biomarker profiles could be obtained with ≥1 µl of blood, saliva, or semen, and ≥1 ng of total RNAs from menstrual blood, vaginal secretion, or urine samples, respectively, using this multiplex assay under the established conditions. Collectively, the newly established multiplex assay can assist in determining the biological origin of forensic stains.


Forensic Genetics/methods , Genetic Markers , Multiplex Polymerase Chain Reaction , RNA, Circular/metabolism , RNA, Messenger/metabolism , Adult , Animals , Blood Chemical Analysis , Cervix Mucus/chemistry , Female , Humans , Male , Menstruation , Middle Aged , Saliva/chemistry , Semen/chemistry , Sensitivity and Specificity , Urine/chemistry , Young Adult
...