Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
BMC Bioinformatics ; 25(1): 109, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38475727

RESUMO

BACKGROUND: Parent-of-origin allele-specific gene expression (ASE) can be detected in interspecies hybrids by virtue of RNA sequence variants between the parental haplotypes. ASE is detectable by differential expression analysis (DEA) applied to the counts of RNA-seq read pairs aligned to parental references, but aligners do not always choose the correct parental reference. RESULTS: We used public data for species that are known to hybridize. We measured our ability to assign RNA-seq read pairs to their proper transcriptome or genome references. We tested software packages that assign each read pair to a reference position and found that they often favored the incorrect species reference. To address this problem, we introduce a post process that extracts alignment features and trains a random forest classifier to choose the better alignment. On each simulated hybrid dataset tested, our machine-learning post-processor achieved higher accuracy than the aligner by itself at choosing the correct parent-of-origin per RNA-seq read pair. CONCLUSIONS: For the parent-of-origin classification of RNA-seq, machine learning can improve the accuracy of alignment-based methods. This approach could be useful for enhancing ASE detection in interspecies hybrids, though RNA-seq from real hybrids may present challenges not captured by our simulations. We believe this is the first application of machine learning to this problem domain.


Assuntos
Software , Transcriptoma , RNA-Seq , Análise de Sequência de RNA/métodos , Aprendizado de Máquina
2.
Theor Appl Genet ; 137(6): 130, 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38744692

RESUMO

KEY MESSAGE: Genome-wide association study of color spaces across the four cultivated Capsicum spp. revealed a shared set of genes influencing fruit color, suggesting mechanisms and pathways across Capsicum species are conserved during the speciation. Notably, Cytochrome P450 of the carotenoid pathway, MYB transcription factor, and pentatricopeptide repeat-containing protein are the major genes responsible for fruit color variation across the Capsicum species. Peppers (Capsicum spp.) rank among the most widely consumed spices globally. Fruit color, serving as a determinant for use in food colorants and cosmeceuticals and an indicator of nutritional contents, significantly influences market quality and price. Cultivated Capsicum species display extensive phenotypic diversity, especially in fruit coloration. Our study leveraged the genetic variance within four Capsicum species (Capsicum baccatum, Capsicum chinense, Capsicum frutescens, and Capsicum annuum) to elucidate the genetic mechanisms driving color variation in peppers and related Solanaceae species. We analyzed color metrics and chromatic attributes (Red, Green, Blue, L*, a*, b*, Luminosity, Hue, and Chroma) on samples cultivated over six years (2015-2021). We resolved genomic regions associated with fruit color diversity through the sets of SNPs obtained from Genotyping by Sequencing (GBS) and genome-wide association study (GWAS) with a Multi-Locus Mixed Linear Model (MLMM). Significant SNPs with FDR correction were identified, within the Cytochrome P450, MYB-related genes, Pentatricopeptide repeat proteins, and ABC transporter family were the most common among the four species, indicating comparative evolution of fruit colors. We further validated the role of a pentatricopeptide repeat-containing protein (Chr01:31,205,460) and a cytochrome P450 enzyme (Chr08:45,351,919) via competitive allele-specific PCR (KASP) genotyping. Our findings advance the understanding of the genetic underpinnings of Capsicum fruit coloration, with developed KASP assays holding potential for applications in crop breeding and aligning with consumer preferences. This study provides a cornerstone for future research into exploiting Capsicum's diverse fruit color variation.


Assuntos
Capsicum , Frutas , Fenótipo , Pigmentação , Polimorfismo de Nucleotídeo Único , Capsicum/genética , Capsicum/crescimento & desenvolvimento , Frutas/genética , Frutas/crescimento & desenvolvimento , Pigmentação/genética , Cor , Genótipo , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Sistema Enzimático do Citocromo P-450/genética , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Variação Genética
3.
Neurol Sci ; 45(3): 1041-1050, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37759100

RESUMO

BACKGROUND: The Apolipoprotein E (APOE) ε4 allele is a risk factor for late-onset Alzheimer's disease (AD). However, no investigation has focused on racial differences in the longitudinal effect of APOE genotypes on CSF amyloid beta (Aß42) and tau levels in AD. METHODS: This study used data from the Alzheimer's Disease Neuroimaging Initiative (ADNI): 222 participants with AD, 264 with cognitive normal (CN), and 692 with mild cognitive impairment (MCI) at baseline and two years follow-up. We used a linear mixed model to investigate the effect of APOE-ε4-genotypes on longitudinal changes in the amyloid beta and tau levels. RESULTS: Individuals with 1 or 2 APOE ε4 alleles revealed significantly higher t-Tau and p-Tau, but lower amyloid beta Aß42 compared with individuals without APOE ε4 alleles. Significantly higher levels of log-t-Tau, log-p-Tau, and low levels of log-Aß42 were observed in the subjects with older age, being female, and the two diagnostic groups (AD and MCI). The higher p-Tau and Aß42 values are associated with poor Mini-Mental State Examination (MMSE) performance. Non-Hispanic Africa American (AA) and Hispanic participants were associated with decreased log-t-Tau levels (ß = - 0.154, p = 0.0112; ß = - 0.207, and p = 0.0016, respectively) as compared to those observed in Whites. Furthermore, Hispanic participants were associated with a decreased log-p-Tau level (ß = - 0.224, p = 0.0023) compared to those observed in Whites. There were no differences in Aß42 level for non-Hispanic AA and Hispanic participants compared with White participants. CONCLUSION: Our study, for the first time, showed that the APOE ε4 allele was associated with these biomarkers, however with differing degrees among racial groups.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Feminino , Humanos , Masculino , Doença de Alzheimer/genética , Doença de Alzheimer/diagnóstico , Peptídeos beta-Amiloides , Apolipoproteína E4/genética , Apolipoproteínas E/genética , Biomarcadores , Disfunção Cognitiva/genética , Disfunção Cognitiva/diagnóstico , Fragmentos de Peptídeos , Fatores Raciais , Proteínas tau
4.
Int J Mol Sci ; 25(12)2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38928266

RESUMO

Curcumin, a polyphenol derived from Curcuma longa, used as a dietary spice, has garnered attention for its therapeutic potential, including antioxidant, anti-inflammatory, and antimicrobial properties. Despite its known benefits, the precise mechanisms underlying curcumin's effects on consumers remain unclear. To address this gap, we employed the genetic model Drosophila melanogaster and leveraged two omics tools-transcriptomics and metabolomics. Our investigation revealed alterations in 1043 genes and 73 metabolites upon supplementing curcumin into the diet. Notably, we observed genetic modulation in pathways related to antioxidants, carbohydrates, and lipids, as well as genes associated with gustatory perception and reproductive processes. Metabolites implicated in carbohydrate metabolism, amino acid biosynthesis, and biomarkers linked to the prevention of neurodegenerative diseases such as schizophrenia, Alzheimer's, and aging were also identified. The study highlighted a strong correlation between the curcumin diet, antioxidant mechanisms, and amino acid metabolism. Conversely, a lower correlation was observed between carbohydrate metabolism and cholesterol biosynthesis. This research highlights the impact of curcumin on the diet, influencing perception, fertility, and molecular wellness. Furthermore, it directs future studies toward a more focused exploration of the specific effects of curcumin consumption.


Assuntos
Curcumina , Drosophila melanogaster , Metaboloma , Transcriptoma , Animais , Drosophila melanogaster/efeitos dos fármacos , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Curcumina/farmacologia , Curcumina/administração & dosagem , Metaboloma/efeitos dos fármacos , Transcriptoma/efeitos dos fármacos , Antioxidantes/farmacologia , Antioxidantes/metabolismo , Dieta , Metabolômica/métodos
5.
Brief Bioinform ; 22(2): 1767-1781, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32363395

RESUMO

Modern machine learning techniques (such as deep learning) offer immense opportunities in the field of human biological aging research. Aging is a complex process, experienced by all living organisms. While traditional machine learning and data mining approaches are still popular in aging research, they typically need feature engineering or feature extraction for robust performance. Explicit feature engineering represents a major challenge, as it requires significant domain knowledge. The latest advances in deep learning provide a paradigm shift in eliciting meaningful knowledge from complex data without performing explicit feature engineering. In this article, we review the recent literature on applying deep learning in biological age estimation. We consider the current data modalities that have been used to study aging and the deep learning architectures that have been applied. We identify four broad classes of measures to quantify the performance of algorithms for biological age estimation and based on these evaluate the current approaches. The paper concludes with a brief discussion on possible future directions in biological aging research using deep learning. This study has significant potentials for improving our understanding of the health status of individuals, for instance, based on their physical activities, blood samples and body shapes. Thus, the results of the study could have implications in different health care settings, from palliative care to public health.


Assuntos
Envelhecimento/fisiologia , Aprendizado Profundo , Antropometria , Biomarcadores/metabolismo , Biologia Computacional/métodos , Registros Eletrônicos de Saúde , Epigênese Genética , Exercício Físico , Humanos , Redes Neurais de Computação
6.
BMC Bioinformatics ; 23(1): 266, 2022 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-35804303

RESUMO

BACKGROUND: Protein-protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming. RESULTS: In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database with M proteins can be transformed into a much more simpler problem: to find a number inside a sorted array of length M. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship. CONCLUSIONS: The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from [Formula: see text] to [Formula: see text] for performing an all-against-all PPI prediction for a database with M proteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.


Assuntos
Descoberta de Drogas , Proteínas , Sequência de Aminoácidos , Bases de Dados de Proteínas , Proteínas/metabolismo
7.
Int J Mol Sci ; 23(17)2022 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-36077322

RESUMO

The habanero pepper (Capsicum chinense) is an increasingly important spice and vegetable crop worldwide because of its high capsaicin content and pungent flavor. Diets supplemented with the phytochemicals found in habanero peppers might cause shifts in an organism's metabolism and gene expression. Thus, understanding how these interactions occur can reveal the potential health effects associated with such changes. We performed transcriptomic and metabolomic analyses of Drosophila melanogaster adult flies reared on a habanero pepper diet. We found 539 genes/59 metabolites that were differentially expressed/accumulated in flies fed a pepper versus control diet. Transcriptome results indicated that olfactory sensitivity and behavioral responses to the pepper diet were mediated by olfactory and nutrient-related genes including gustatory receptors (Gr63a, Gr66a, and Gr89a), odorant receptors (Or23a, Or59a, Or82a, and Orco), and odorant-binding proteins (Obp28a, Obp83a, Obp83b, Obp93a, and Obp99a). Metabolome analysis revealed that campesterol, sitosterol, and sucrose were highly upregulated and azelaic acid, ethyl phosphoric acid, and citric acid were the major metabolites downregulated in response to the habanero pepper diet. Further investigation by integration analysis between transcriptome and metabolome data at gene pathway levels revealed six unique enriched pathways, including phenylalanine metabolism; insect hormone biosynthesis; pyrimidine metabolism; glyoxylate, and dicarboxylate metabolism; glycine, serine, threonine metabolism; and glycerolipid metabolism. In view of the transcriptome and metabolome findings, our comprehensive analysis of the response to a pepper diet in Drosophila have implications for exploring the molecular mechanism of pepper consumption.


Assuntos
Capsicum , Piper nigrum , Animais , Capsicum/química , Capsicum/genética , Dieta , Drosophila melanogaster/genética , Metaboloma , Piper nigrum/genética , Transcriptoma
8.
Nature ; 580(7802): 192-194, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32214236
9.
Bioinformatics ; 34(10): 1682-1689, 2018 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-29253072

RESUMO

Motivation: Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. Results: We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. Availability and implementation: The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). Contact: yueljiang@163.com. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Filogenia , Análise de Sequência de DNA/métodos , Software , Algoritmos , Animais , Alinhamento de Sequência
10.
BMC Bioinformatics ; 19(1): 165, 2018 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-29720081

RESUMO

BACKGROUND: Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. RESULTS: A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. CONCLUSIONS: Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.


Assuntos
Algoritmos , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência , Análise de Ondaletas , Análise por Conglomerados , Humanos
11.
Molecules ; 23(3)2018 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-29562711

RESUMO

In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI). In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed approaches, including comparative results against leading state-of-the-art methods.


Assuntos
Algoritmos , Modelos Moleculares , Proteínas de Ligação a RNA/metabolismo , RNA/metabolismo , Bases de Dados de Proteínas , Conformação de Ácido Nucleico , RNA/química , RNA Longo não Codificante/metabolismo , Proteínas de Ligação a RNA/química
12.
Nucleic Acids Res ; 43(3): 1370-9, 2015 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-25609700

RESUMO

RNA-protein complexes are essential in mediating important fundamental cellular processes, such as transport and localization. In particular, ncRNA-protein interactions play an important role in post-transcriptional gene regulation like mRNA localization, mRNA stabilization, poly-adenylation, splicing and translation. The experimental methods to solve RNA-protein interaction prediction problem remain expensive and time-consuming. Here, we present the RPI-Pred (RNA-protein interaction predictor), a new support-vector machine-based method, to predict protein-RNA interaction pairs, based on both the sequences and structures. The results show that RPI-Pred can correctly predict RNA-protein interaction pairs with ∼94% prediction accuracy when using sequence and experimentally determined protein and RNA structures, and with ∼83% when using sequences and predicted protein and RNA structures. Further, our proposed method RPI-Pred was superior to other existing ones by predicting more experimentally validated ncRNA-protein interaction pairs from different organisms. Motivated by the improved performance of RPI-Pred, we further applied our method for reliable construction of ncRNA-protein interaction networks. The RPI-Pred is publicly available at: http://ctsb.is.wfubmc.edu/projects/rpi-pred.


Assuntos
Mapas de Interação de Proteínas , RNA não Traduzido/metabolismo , Proteínas de Ligação a RNA/metabolismo , Animais , Conformação de Ácido Nucleico , RNA não Traduzido/química , Proteínas de Ligação a RNA/química , Máquina de Vetores de Suporte
13.
BMC Genomics ; 17 Suppl 4: 544, 2016 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-27556803

RESUMO

BACKGROUND: The longest common subsequence (LCS) problem is a classical problem in computer science, and forms the basis of the current best-performing reference-based compression schemes for genome resequencing data. METHODS: First, we present a new algorithm for the LCS problem. Using the generalized suffix tree, we identify the common substrings shared between the two input sequences. Using the maximal common substrings, we construct a directed acyclic graph (DAG), based on which we determine the LCS as the longest path in the DAG. Then, we introduce an LCS-motivated reference-based compression scheme using the components of the LCS, rather than the LCS itself. RESULTS: Our basic scheme compressed the Homo sapiens genome (with an original size of 3,080,436,051 bytes) to 15,460,478 bytes. An improvement on the basic method further reduced this to 8,556,708 bytes, or an overall compression ratio of 360. This can be compared to the previous state-of-the-art compression ratios of 157 (Wang and Zhang, 2011) and 171 (Pinho, Pratas, and Garcia, 2011). CONCLUSION: We propose a new algorithm to address the longest common subsequence problem. Motivated by our LCS algorithm, we introduce a new reference-based compression scheme for genome resequencing data. Comparative results against state-of-the-art reference-based compression algorithms demonstrate the performance of the proposed method.


Assuntos
Algoritmos , Biologia Computacional/métodos , Software , Genoma Humano , Humanos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos
15.
Neural Netw ; 176: 106338, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38692190

RESUMO

Electroencephalography (EEG) based Brain Computer Interface (BCI) systems play a significant role in facilitating how individuals with neurological impairments effectively interact with their environment. In real world applications of BCI system for clinical assistance and rehabilitation training, the EEG classifier often needs to learn on sequentially arriving subjects in an online manner. As patterns of EEG signals can be significantly different for different subjects, the EEG classifier can easily erase knowledge of learnt subjects after learning on later ones as it performs decoding in online streaming scenario, namely catastrophic forgetting. In this work, we tackle this problem with a memory-based approach, which considers the following conditions: (1) subjects arrive sequentially in an online manner, with no large scale dataset available for joint training beforehand, (2) data volume from the different subjects could be imbalanced, (3) decoding difficulty of the sequential streaming signal vary, (4) continual classification for a long time is required. This online sequential EEG decoding problem is more challenging than classic cross subject EEG decoding as there is no large-scale training data from the different subjects available beforehand. The proposed model keeps a small balanced memory buffer during sequential learning, with memory data dynamically selected based on joint consideration of data volume and informativeness. Furthermore, for the more general scenarios where subject identity is unknown to the EEG decoder, aka. subject agnostic scenario, we propose a kernel based subject shift detection method that identifies underlying subject changes on the fly in a computationally efficient manner. We develop challenging benchmarks of streaming EEG data from sequentially arriving subjects with both balanced and imbalanced data volumes, and performed extensive experiments with a detailed ablation study on the proposed model. The results show the effectiveness of our proposed approach, enabling the decoder to maintain performance on all previously seen subjects over a long period of sequential decoding. The model demonstrates the potential for real-world applications.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Memória , Eletroencefalografia/métodos , Humanos , Memória/fisiologia , Processamento de Sinais Assistido por Computador , Encéfalo/fisiologia , Algoritmos
16.
Bioinformatics ; 28(10): 1314-23, 2012 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-22522137

RESUMO

MOTIVATION: Markov models are very popular for analyzing complex sequences such as protein sequences, whose sources are unknown, or whose underlying statistical characteristics are not well understood. A major problem is the computational complexity involved with using Markov models, especially the exponential growth of their size with the order of the model. The probabilistic suffix tree (PST) and its improved variant sparse probabilistic suffix tree (SPST) have been proposed to address some of the key problems with Markov models. The use of the suffix tree, however, implies that the space requirement for the PST/SPST could still be high. RESULTS: We present the probabilistic suffix array (PSA), a data structure for representing information in variable length Markov chains. The PSA essentially encodes information in a Markov model by providing a time and space-efficient alternative to the PST/SPST. Given a sequence of length N, construction and learning in the PSA is done in O(N) time and space, independent of the Markov order. Prediction using the PSA is performed in O(mlog{N is divided by /Σ/}) time, where m is the pattern length, and Σ is the symbol alphabet. In terms of modeling and prediction accuracy, using protein families from Pfam 25.0, SPST and PSA produced similar results (SPST 89.82%, PSA 89.56%), but slightly lower than HMMER3 (92.55%). A modified algorithm for PSA prediction improved the performance to 91.7%, or just 0.79% from HMMER3 results. The average (maximum) practical construction space for the protein families tested was 21.58±6.32N (41.11N) bytes using the PSA, 27.55±13.16N (63.01N) bytes using SPST and 47±24.95N (140.3N) bytes for HMMER3. The PSA was 255 times faster to construct than the SPST, and 11 times faster than HMMER3.


Assuntos
Algoritmos , Cadeias de Markov , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína
17.
Sci Rep ; 13(1): 11881, 2023 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-37482553

RESUMO

With an increasing number of new scientific papers being released, it becomes harder for researchers to be aware of recent articles in their field of study. Accurately classifying papers is a first step in the direction of personalized catering and easy access to research of interest. The field of Density Functional Theory (DFT) in particular is a good example of a methodology used in very different studies, and interconnected disciplines, which has a very strong community publishing many research articles. We devise a new unsupervised method for classifying publications, based on topic modeling, and use a DFT-related selection of documents as a use case. We first create topics from word analysis and clustering of the abstracts from the publications, then attribute each publication/paper to a topic based on word similarity. We then make interesting observations by analyzing connections between the topics and publishers, journals, country or year of publication. The proposed approach is general, and can be applied to analyze publication and citation trends in other areas of study, beyond the field of Density Function Theory.

18.
IEEE J Biomed Health Inform ; 27(6): 2818-2828, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37028019

RESUMO

The automatic classification of electrocardiogram (ECG) signals has played an important role in cardiovascular diseases diagnosis and prediction. With recent advancements in deep neural networks (DNNs), particularly Convolutional Neural Networks (CNNs), learning deep features automatically from the original data is becoming an effective and widespread approach in a variety of intelligent tasks including biomedical and health informatics. However, most of the existing approaches are trained on either 1D CNNs or 2D CNNs, and they suffer from the limitations of random phenomena (i.e. random initial weights). Furthermore, the ability to train such DNNs in a supervised manner in healthcare is often limited due to the scarcity of labeled training data. To address the problems of weight initialization and limited annotated data, in this work, we leverage recent self-supervised learning technique, namely, contrastive learning, and present supervised contrastive learning (sCL). Different from existing self-supervised contrastive learning approaches, which often generate false negatives because of random selection of negative anchors, our contrastive learning makes use of labeled data to pull the same class closer together and push different classes far apart to avoid potential false negatives. Furthermore, unlike other kinds of signals (e.g. speech, image, video), ECG signal is sensitive to changes, and inappropriate transformation could directly affect diagnosis results. To deal with this issue, we present two semantic transformations, i.e. semantic split-join and semantic weighted peaks noise smoothing. The proposed deep neural network sCL-ST with supervised contrastive learning and semantic transformations is trained as an end-to-end framework for the multi-label classification of 12-lead ECGs. Our sCL-ST network contains two sub-networks i.e. pre-text task and down-stream task. Our experimental results have been evaluated on 12-lead PhysioNet 2020 dataset and shown that our proposed network outperforms the state-of-the-art existing approaches.


Assuntos
Informática Médica , Semântica , Humanos , Eletrocardiografia , Arritmias Cardíacas/diagnóstico , Redes Neurais de Computação
19.
ACS ES T Eng ; 3(10): 1424-1467, 2023 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-37854077

RESUMO

Municipal and agricultural organic waste can be treated to recover energy, nutrients, and carbon through resource recovery and carbon capture (RRCC) technologies such as anaerobic digestion, struvite precipitation, and pyrolysis. Data science could benefit such technologies by improving their efficiency through data-driven process modeling along with reducing environmental and economic burdens via life cycle assessment (LCA) and techno-economic analysis (TEA), respectively. We critically reviewed 616 peer-reviewed articles on the use of data science in RRCC published during 2002-2022. Although applications of machine learning (ML) methods have drastically increased over time for modeling RRCC technologies, the reviewed studies exhibited significant knowledge gaps at various model development stages. In terms of sustainability, an increasing number of studies included LCA with TEA to quantify both environmental and economic impacts of RRCC. Integration of ML methods with LCA and TEA has the potential to cost-effectively investigate the trade-off between efficiency and sustainability of RRCC, although the literature lacked such integration of techniques. Therefore, we propose an integrated data science framework to inform efficient and sustainable RRCC from organic waste based on the review. Overall, the findings from this review can inform practitioners about the effective utilization of various data science methods for real-world implementation of RRCC technologies.

20.
Cancers (Basel) ; 15(21)2023 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-37958322

RESUMO

Bone marrow mesenchymal stem cells (BM MSCs) play a tumor-supportive role in promoting drug resistance and disease relapse in multiple myeloma (MM). Recent studies have discovered a sub-population of MSCs, known as inflammatory MSCs (iMSCs), exclusive to the MM BM microenvironment and implicated in drug resistance. Through a sophisticated analysis of public expression data from unexpanded BM MSCs, we uncovered a positive association between iMSC signature expression and minimal residual disease. While in vitro expansion generally results in the loss of the iMSC signature, our meta-analysis of additional public expression data demonstrated that cytokine stimulation, including IL1-ß and TNF-α, as well as immune cells such as neutrophils, macrophages, and MM cells, can reactivate the signature expression of iMSCs to varying extents. These findings underscore the importance and potential utility of cytokine stimulation in mimicking the gene expression signature of early passage of iMSCs for functional characterizations of their tumor-supportive roles in MM.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa