Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
1.
BMC Bioinformatics ; 25(1): 267, 2024 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-39160480

RESUMO

BACKGROUND: The utilization of long reads for single nucleotide polymorphism (SNP) phasing has become popular, providing substantial support for research on human diseases and genetic studies in animals and plants. However, due to the complexity of the linkage relationships between SNP loci and sequencing errors in the reads, the recent methods still cannot yield satisfactory results. RESULTS: In this study, we present a graph-based algorithm, GCphase, which utilizes the minimum cut algorithm to perform phasing. First, based on alignment between long reads and the reference genome, GCphase filters out ambiguous SNP sites and useless read information. Second, GCphase constructs a graph in which a vertex represents alleles of an SNP locus and each edge represents the presence of read support; moreover, GCphase adopts a graph minimum-cut algorithm to phase the SNPs. Next, GCpahse uses two error correction steps to refine the phasing results obtained from the previous step, effectively reducing the error rate. Finally, GCphase obtains the phase block. GCphase was compared to three other methods, WhatsHap, HapCUT2, and LongPhase, on the Nanopore and PacBio long-read datasets. The code is available from https://github.com/baimawjy/GCphase . CONCLUSIONS: Experimental results show that GCphase under different sequencing depths of different data has the least number of switch errors and the highest accuracy compared with other methods.


Assuntos
Algoritmos , Polimorfismo de Nucleotídeo Único , Polimorfismo de Nucleotídeo Único/genética , Humanos , Análise de Sequência de DNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos
2.
Front Genet ; 15: 1378809, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39161422

RESUMO

Introduction: Developing effective breast cancer survival prediction models is critical to breast cancer prognosis. With the widespread use of next-generation sequencing technologies, numerous studies have focused on survival prediction. However, previous methods predominantly relied on single-omics data, and survival prediction using multi-omics data remains a significant challenge. Methods: In this study, considering the similarity of patients and the relevance of multi-omics data, we propose a novel multi-omics stacked fusion network (MSFN) based on a stacking strategy to predict the survival of breast cancer patients. MSFN first constructs a patient similarity network (PSN) and employs a residual graph neural network (ResGCN) to obtain correlative prognostic information from PSN. Simultaneously, it employs convolutional neural networks (CNNs) to obtain specificity prognostic information from multi-omics data. Finally, MSFN stacks the prognostic information from these networks and feeds into AdaboostRF for survival prediction. Results: Experiments results demonstrated that our method outperformed several state-of-the-art methods, and biologically validated by Kaplan-Meier and t-SNE.

3.
PeerJ Comput Sci ; 10: e2169, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39145235

RESUMO

The Boolean satisfiability (SAT) problem exhibits different structural features in various domains. Neural network models can be used as more generalized algorithms that can be learned to solve specific problems based on different domain data than traditional rule-based approaches. How to accurately identify these structural features is crucial for neural networks to solve the SAT problem. Currently, learning-based SAT solvers, whether they are end-to-end models or enhancements to traditional heuristic algorithms, have achieved significant progress. In this article, we propose TG-SAT, an end-to-end framework based on Transformer and gated recurrent neural network (GRU) for predicting the satisfiability of SAT problems. TG-SAT can learn the structural features of SAT problems in a weakly supervised environment. To capture the structural information of the SAT problem, we encodes a SAT problem as an undirected graph and integrates GRU into the Transformer structure to update the node embeddings. By computing cross-attention scores between literals and clauses, a weighted representation of nodes is obtained. The model is eventually trained as a classifier to predict the satisfiability of the SAT problem. Experimental results demonstrate that TG-SAT achieves a 2%-5% improvement in accuracy on random 3-SAT problems compared to NeuroSAT. It also outperforms in SR(N), especially in handling more complex SAT problems, where our model achieves higher prediction accuracy.

4.
Comput Biol Med ; 179: 108792, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38964242

RESUMO

BACKGROUND AND OBJECTIVE: Concerns about patient privacy issues have limited the application of medical deep learning models in certain real-world scenarios. Differential privacy (DP) can alleviate this problem by injecting random noise into the model. However, naively applying DP to medical models will not achieve a satisfactory balance between privacy and utility due to the high dimensionality of medical models and the limited labeled samples. METHODS: This work proposed the DP-SSLoRA model, a privacy-preserving classification model for medical images combining differential privacy with self-supervised low-rank adaptation. In this work, a self-supervised pre-training method is used to obtain enhanced representations from unlabeled publicly available medical data. Then, a low-rank decomposition method is employed to mitigate the impact of differentially private noise and combined with pre-trained features to conduct the classification task on private datasets. RESULTS: In the classification experiments using three real chest-X ray datasets, DP-SSLoRA achieves good performance with strong privacy guarantees. Under the premise of ɛ=2, with the AUC of 0.942 in RSNA, the AUC of 0.9658 in Covid-QU-mini, and the AUC of 0.9886 in Chest X-ray 15k. CONCLUSION: Extensive experiments on real chest X-ray datasets show that DP-SSLoRA can achieve satisfactory performance with stronger privacy guarantees. This study provides guidance for studying privacy-preserving in the medical field. Source code is publicly available online. https://github.com/oneheartforone/DP-SSLoRA.


Assuntos
Privacidade , Humanos , Aprendizado Profundo , COVID-19 , SARS-CoV-2 , Algoritmos
5.
J Phys Chem Lett ; 15(27): 7055-7060, 2024 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-38949914

RESUMO

The low thermal conductivity of group IV-VI semiconductors is often attributed to the soft phonons and giant anharmonicity observed in these materials. However, there is still no broad consensus on the fundamental origin of this giant anharmonic effect. Utilizing first-principles calculations and group symmetry analysis, we find that the cation lone-pairs s electrons in IV-VI materials cause a significant coupling between occupied cation s orbitals and unoccupied cation p orbitals due to the symmetry reduction when atoms vibrate away from their equilibrium positions under heating. This leads to an electronic energy gain, consequently flattening the potential energy surface and causing soft phonons and strong anharmonic effects. Our findings provide an intrinsic understanding of the low thermal conductivity in IV-VI compounds by connecting the anharmonicity with the dynamical electronic structures, and can also be extended to a large family of hybrid systems with lone-pair electrons, for promising thermoelectric applications and predictive designs.

6.
Front Bioinform ; 4: 1403826, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39077754

RESUMO

The identification of cancer subtypes plays a very important role in the field of medicine. Accurate identification of cancer subtypes is helpful for both cancer treatment and prognosis Currently, most methods for cancer subtype identification are based on single-omics data, such as gene expression data. However, multi-omics data can show various characteristics about cancer, which also can improve the accuracy of cancer subtype identification. Therefore, how to extract features from multi-omics data for cancer subtype identification is the main challenge currently faced by researchers. In this paper, we propose a cancer subtype identification method named CAEM-GBDT, which takes gene expression data, miRNA expression data, and DNA methylation data as input, and adopts convolutional autoencoder network to identify cancer subtypes. Through a convolutional encoder layer, the method performs feature extraction on the input data. Within the convolutional encoder layer, a convolutional self-attention module is embedded to recognize higher-level representations of the multi-omics data. The extracted high-level representations from the convolutional encoder are then concatenated with the input to the decoder. The GBDT (Gradient Boosting Decision Tree) is utilized for cancer subtype identification. In the experiments, we compare CAEM-GBDT with existing cancer subtype identifying methods. Experimental results demonstrate that the proposed CAEM-GBDT outperforms other methods. The source code is available from GitHub at https://github.com/gxh-1/CAEM-GBDT.git.

7.
Front Pharmacol ; 15: 1398231, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38835667

RESUMO

Synthetic lethality (SL) is widely used to discover the anti-cancer drug targets. However, the identification of SL interactions through wet experiments is costly and inefficient. Hence, the development of efficient and high-accuracy computational methods for SL interactions prediction is of great significance. In this study, we propose MPASL, a multi-perspective learning knowledge graph attention network to enhance synthetic lethality prediction. MPASL utilizes knowledge graph hierarchy propagation to explore multi-source neighbor nodes related to genes. The knowledge graph ripple propagation expands gene representations through existing gene SL preference sets. MPASL can learn the gene representations from both gene-entity perspective and entity-entity perspective. Specifically, based on the aggregation method, we learn to obtain gene-oriented entity embeddings. Then, the gene representations are refined by comparing the various layer-wise neighborhood features of entities using the discrepancy contrastive technique. Finally, the learned gene representation is applied in SL prediction. Experimental results demonstrated that MPASL outperforms several state-of-the-art methods. Additionally, case studies have validated the effectiveness of MPASL in identifying SL interactions between genes.

8.
Front Genet ; 15: 1404415, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38798694

RESUMO

Motivation: Genomic structural variation refers to chromosomal level variations such as genome rearrangement or insertion/deletion, which typically involve larger DNA fragments compared to single nucleotide variations. Deletion is a common type of structural variants in the genome, which may lead to mangy diseases, so the detection of deletions can help to gain insights into the pathogenesis of diseases and provide accurate information for disease diagnosis, treatment, and prevention. Many tools exist for deletion variant detection, but they are still inadequate in some aspects, and most of them ignore the presence of chimeric variants in clustering, resulting in less precise clustering results. Results: In this paper, we present LcDel, which can detect deletion variation based on clustering and long reads. LcDel first finds the candidate deletion sites and then performs the first clustering step using two clustering methods (sliding window-based and coverage-based, respectively) based on the length of the deletion. After that, LcDel immediately uses the second clustering by hierarchical clustering to determine the location and length of the deletion. LcDel is benchmarked against some other structural variation detection tools on multiple datasets, and the results show that LcDel has better detection performance for deletion. The source code is available in https://github.com/cyq1314woaini/LcDel.

9.
Front Genet ; 15: 1393406, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38770419

RESUMO

Motivation: In recent years, there have been significant advances in various chromatin conformation capture techniques, and annotating the topological structure from Hi-C contact maps has become crucial for studying the three-dimensional structure of chromosomes. However, the structure and function of chromatin loops are highly dynamic and diverse, influenced by multiple factors. Therefore, obtaining the three-dimensional structure of the genome remains a challenging task. Among many chromatin loop prediction methods, it is difficult to fully extract features from the contact map and make accurate predictions at low sequencing depths. Results: In this study, we put forward a deep learning framework based on the diffusion model called CD-Loop for predicting accurate chromatin loops. First, by pre-training the input data, we obtain prior probabilities for predicting the classification of the Hi-C contact map. Then, by combining the denoising process based on the diffusion model and the prior probability obtained by pre-training, candidate loops were predicted from the input Hi-C contact map. Finally, CD-Loop uses a density-based clustering algorithm to cluster the candidate chromatin loops and predict the final chromatin loops. We compared CD-Loop with the currently popular methods, such as Peakachu, Chromosight, and Mustache, and found that in different cell types, species, and sequencing depths, CD-Loop outperforms other methods in loop annotation. We conclude that CD-Loop can accurately predict chromatin loops and reveal cell-type specificity. The code is available at https://github.com/wangyang199897/CD-Loop.

10.
Artigo em Inglês | MEDLINE | ID: mdl-38683714

RESUMO

Bridge detection in remote sensing images (RSIs) plays a crucial role in various applications, but it poses unique challenges compared to the detection of other objects. In RSIs, bridges exhibit considerable variations in terms of their spatial scales and aspect ratios. Therefore, to ensure the visibility and integrity of bridges, it is essential to perform holistic bridge detection in large-size very-high-resolution (VHR) RSIs. However, the lack of datasets with large-size VHR RSIs limits the deep learning algorithms' performance on bridge detection. Due to the limitation of GPU memory in tackling large-size images, deep learning-based object detection methods commonly adopt the cropping strategy, which inevitably results in label fragmentation and discontinuous prediction. To ameliorate the scarcity of datasets, this paper proposes a large-scale dataset named GLH-Bridge comprising 6,000 VHR RSIs sampled from diverse geographic locations across the globe. These images encompass a wide range of sizes, varying from 2,048 × 2,048 to 16,384 × 16,384 pixels, and collectively feature 59,737 bridges. These bridges span diverse backgrounds, and each of them has been manually annotated, using both an oriented bounding box (OBB) and a horizontal bounding box (HBB). Furthermore, we present an efficient network for holistic bridge detection (HBD-Net) in large-size RSIs. The HBD-Net presents a separate detector-based feature fusion (SDFF) architecture and is optimized via a shape-sensitive sample re-weighting (SSRW) strategy. The SDFF architecture performs inter-layer feature fusion (IFF) to incorporate multi-scale context in the dynamic image pyramid (DIP) of the large-size image, and the SSRW strategy is employed to ensure an equitable balance in the regression weight of bridges with various aspect ratios. Based on the proposed GLH-Bridge dataset, we establish a bridge detection benchmark including the OBB and HBB tasks, and validate the effectiveness of the proposed HBD-Net. Additionally, cross-dataset generalization experiments on two publicly available datasets illustrate the strong generalization capability of the GLH-Bridge dataset. The dataset and source code will be released at https://luo-z13.github.io/GLH-Bridge-page/.

11.
Front Pharmacol ; 15: 1337764, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38384286

RESUMO

Accurately identifying novel indications for drugs is crucial in drug research and discovery. Traditional drug discovery is costly and time-consuming. Computational drug repositioning can provide an effective strategy for discovering potential drug-disease associations. However, the known experimentally verified drug-disease associations is relatively sparse, which may affect the prediction performance of the computational drug repositioning methods. Moreover, while the existing drug-disease prediction method based on metric learning algorithm has achieved better performance, it simply learns features of drugs and diseases only from the drug-centered perspective, and cannot comprehensively model the latent features of drugs and diseases. In this study, we propose a novel drug repositioning method named RSML-GCN, which applies graph convolutional network and reinforcement symmetric metric learning to predict potential drug-disease associations. RSML-GCN first constructs a drug-disease heterogeneous network by integrating the association and feature information of drugs and diseases. Then, the graph convolutional network (GCN) is applied to complement the drug-disease association information. Finally, reinforcement symmetric metric learning with adaptive margin is designed to learn the latent vector representation of drugs and diseases. Based on the learned latent vector representation, the novel drug-disease associations can be identified by the metric function. Comprehensive experiments on benchmark datasets demonstrated the superior prediction performance of RSML-GCN for drug repositioning.

12.
iScience ; 27(3): 109148, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38405609

RESUMO

Drug-drug interactions (DDIs) can produce unpredictable pharmacological effects and lead to adverse events that have the potential to cause irreversible damage to the organism. Traditional methods to detect DDIs through biological or pharmacological analysis are time-consuming and expensive, therefore, there is an urgent need to develop computational methods to effectively predict drug-drug interactions. Currently, deep learning and knowledge graph techniques which can effectively extract features of entities have been widely utilized to develop DDI prediction methods. In this research, we aim to systematically review DDI prediction researches applying deep learning and graph knowledge. The available biomedical data and public databases related to drugs are firstly summarized in this review. Then, we discuss the existing drug-drug interactions prediction methods which have utilized deep learning and knowledge graph techniques and group them into three main classes: deep learning-based methods, knowledge graph-based methods, and methods that combine deep learning with knowledge graph. We comprehensively analyze the commonly used drug related data and various DDI prediction methods, and compare these prediction methods on benchmark datasets. Finally, we briefly discuss the challenges related to drug-drug interactions prediction, including asymmetric DDIs prediction and high-order DDI prediction.

13.
Nat Commun ; 15(1): 618, 2024 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-38242877

RESUMO

Germanium (Ge) is an attractive material for Silicon (Si) compatible optoelectronics, but the nature of its indirect bandgap renders it an inefficient light emitter. Drawing inspiration from the significant expansion of Ge volume upon lithiation as a Lithium (Li) ion battery anode, here, we propose incorporating Li atoms into the Ge to cause lattice expansion to achieve the desired tensile strain for a transition from an indirect to a direct bandgap. Our first-principles calculations show that a minimal amount of 3 at.% Li can convert Ge from an indirect to a direct bandgap to possess a dipole transition matrix element comparable to that of typical direct bandgap semiconductors. To enhance compatibility with Si Complementary-Metal-Oxide-Semiconductors (CMOS) technology, we additionally suggest implanting noble gas atoms instead of Li atoms. We also demonstrate the tunability of the direct-bandgap emission wavelength through the manipulation of dopant concentration, enabling coverage of the mid-infrared to far-infrared spectrum. This Ge-based light-emitting approach presents exciting prospects for surpassing the physical limitations of Si technology in the field of photonics and calls for experimental proof-of-concept studies.

14.
Sci Adv ; 9(50): eadi1618, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-38100591

RESUMO

Ultrafast interaction between the femtosecond laser pulse and the magnetic metal provides an efficient way to manipulate the magnetic states of matter. Numerous experimental advancements have been made on multilayer metallic films in the last two decades. However, the underlying physics remains unclear. Here, relying on an efficient ab initio spin dynamics simulation algorithm, we revealed the physics that can unify the progress in different experiments. We found that light-induced ultrafast spin transport in multilayer metallic films originates from the sp-d spin-exchange interaction, which can induce an ultrafast, large, and pure spin current from ferromagnetic metal to nonmagnetic metal without charge carrier transport. The resulting trends of spin demagnetization and spin flow are consistent with most experiments. It can explain a variety of ultrafast light-spin manipulation experiments with different systems and different pump-probe technologies, covering a wide range of work in this field.

15.
BMC Bioinformatics ; 24(1): 367, 2023 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-37777712

RESUMO

BACKGROUND: Obtaining accurate drug-target binding affinity (DTA) information is significant for drug discovery and drug repositioning. Although some methods have been proposed for predicting DTA, the features of proteins and drugs still need to be further analyzed. Recently, deep learning has been successfully used in many fields. Hence, designing a more effective deep learning method for predicting DTA remains attractive. RESULTS: Dynamic graph DTA (DGDTA), which uses a dynamic graph attention network combined with a bidirectional long short-term memory (Bi-LSTM) network to predict DTA is proposed in this paper. DGDTA adopts drug compound as input according to its corresponding simplified molecular input line entry system (SMILES) and protein amino acid sequence. First, each drug is considered a graph of interactions between atoms and edges, and dynamic attention scores are used to consider which atoms and edges in the drug are most important for predicting DTA. Then, Bi-LSTM is used to better extract the contextual information features of protein amino acid sequences. Finally, after combining the obtained drug and protein feature vectors, the DTA is predicted by a fully connected layer. The source code is available from GitHub at https://github.com/luojunwei/DGDTA . CONCLUSIONS: The experimental results show that DGDTA can predict DTA more accurately than some other methods.


Assuntos
Sistemas de Liberação de Medicamentos , Descoberta de Drogas , Sequência de Aminoácidos , Reposicionamento de Medicamentos , Domínios Proteicos
16.
BMC Bioinformatics ; 24(1): 289, 2023 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-37468832

RESUMO

BACKGROUND: Cancer subtype classification is helpful for personalized cancer treatment. Although, some approaches have been developed to classifying caner subtype based on high dimensional gene expression data, it is difficult to obtain satisfactory classification results. Meanwhile, some cancers have been well studied and classified to some subtypes, which are adopt by most researchers. Hence, this priori knowledge is significant for further identifying new meaningful subtypes. RESULTS: In this paper, we present a combined parallel random forest and autoencoder approach for cancer subtype identification based on high dimensional gene expression data, ForestSubtype. ForestSubtype first adopts the parallel RF and the priori knowledge of cancer subtype to train a module and extract significant candidate features. Second, ForestSubtype uses a random forest as the base module and ten parallel random forests to compute each feature weight and rank them separately. Then, the intersection of the features with the larger weights output by the ten parallel random forests is taken as our subsequent candidate features. Third, ForestSubtype uses an autoencoder to condenses the selected features into a two-dimensional data. Fourth, ForestSubtype utilizes k-means++ to obtain new cancer subtype identification results. In this paper, the breast cancer gene expression data obtained from The Cancer Genome Atlas are used for training and validation, and an independent breast cancer dataset from the Molecular Taxonomy of Breast Cancer International Consortium is used for testing. Additionally, we use two other cancer datasets for validating the generalizability of ForestSubtype. ForestSubtype outperforms the other two methods in terms of the distribution of clusters, internal and external metric results. The open-source code is available at https://github.com/lffyd/ForestSubtype . CONCLUSIONS: Our work shows that the combination of high-dimensional gene expression data and parallel random forests and autoencoder, guided by a priori knowledge, can identify new subtypes more effectively than existing methods of cancer subtype classification.


Assuntos
Neoplasias da Mama , Algoritmo Florestas Aleatórias , Humanos , Feminino , Genômica , Software
17.
J Phys Chem Lett ; 14(29): 6647-6657, 2023 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-37462525

RESUMO

This Perspective focuses on recent advances in understanding ultrafast processes involved in photoinduced structural phase transitions and proposes a strategy for precise manipulation of such transitions. It has been demonstrated that photoexcited carriers occupying empty antibonding or bonding states generate atomic driving forces that lead to either stretching or shortening of associated bonds, which in turn induce collective and coherent motions of atoms and yield structural transitions. For instance, phase transitions in IrTe2 and VO2, and nonthermal melting in Si, can be explained by the occupation of specific local bonding or antibonding states during laser excitation. These cases reveal the electronic-orbital-selective nature of laser-induced structural transitions. Based on this understanding, we propose an inverse design protocol for achieving or preventing a target structural transition by controlling the related electron occupations with orbital-selective photoexcitation. Overall, this Perspective provides a comprehensive overview of recent advancements in dynamical structural control in solid materials.

18.
Opt Express ; 31(11): 17921-17929, 2023 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-37381513

RESUMO

Germanium-on-insulator (GOI) has emerged as a novel platform for Ge-based electronic and photonic applications. Discrete photonic devices, such as waveguides, photodetectors, modulators, and optical pumping lasers, have been successfully demonstrated on this platform. However, there is almost no report on the electrically injected Ge light source on the GOI platform. In this study, we present the first fabrication of vertical Ge p-i-n light-emitting diodes (LEDs) on a 150 mm GOI substrate. The high-quality Ge LED on a 150-mm diameter GOI substrate was fabricated via direct wafer bonding followed by ion implantations. As a tensile strain of 0.19% has been introduced during the GOI fabrication process resulting from the thermal mismatch, the LED devices exhibit a dominant direct bandgap transition peak near 0.785 eV (∼1580 nm) at room temperature. In sharp contrast to conventional III-V LEDs, we found that the electroluminescence (EL)/photoluminescence (PL) spectra show enhanced intensities as the temperature is raised from 300 to 450 K as a consequence of the higher occupation of the direct bandgap. The maximum enhancement in EL intensity is a factor of 140% near 1635 nm due to the improved optical confinement offered by the bottom insulator layer. This work potentially broadens the GOI's functional variety for applications in near-infrared sensing, electronics, and photonics.

19.
Front Genet ; 14: 1189775, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37388936

RESUMO

The role and biological impact of structural variation (SV) are increasingly evident. Deletion accounts for 40% of SV and is an important type of SV. Therefore, it is of great significance to detect and genotype deletions. At present, high accurate long reads can be obtained as HiFi reads. And, through a combination of error-prone long reads and high accurate short reads, we can also get accurate long reads. These accurate long reads are helpful for detecting and genotyping SVs. However, due to the complexity of genome and alignment information, detecting and genotyping SVs remain a challenging task. Here, we propose LSnet, an approach for detecting and genotyping deletions with a deep learning network. Because of the ability of deep learning to learn complex features in labeled datasets, it is beneficial for detecting SV. First, LSnet divides the reference genome into continuous sub-regions. Based on the alignment between the sequencing data (the combination of error-prone long reads and short reads or HiFi reads) and the reference genome, LSnet extracts nine features for each sub-region, and these features are considered as signal of deletion. Second, LSnet uses a convolutional neural network and an attention mechanism to learn critical features in every sub-region. Next, in accordance with the relationship among the continuous sub-regions, LSnet uses a gated recurrent units (GRU) network to further extract more important deletion signatures. And a heuristic algorithm is present to determine the location and length of deletions. Experimental results show that LSnet outperforms other methods in terms of the F1 score. The source code is available from GitHub at https://github.com/eioyuou/LSnet.

20.
iScience ; 26(6): 106966, 2023 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-37378322

RESUMO

As renewable electricity becomes cost competitive with fossil fuel energy sources and environmental concerns increase, the transition to electrified chemical and fuel synthesis pathways becomes increasingly desirable. However, electrochemical systems have traditionally taken many decades to reach commercial scales. Difficulty in scaling up electrochemical synthesis processes comes primarily from difficulty in decoupling and controlling simultaneously the effects of intrinsic kinetics and charge, heat, and mass transport within electrochemical reactors. Tackling this issue efficiently requires a shift in research from an approach based on small datasets, to one where digitalization enables rapid collection and interpretation of large, well-parameterized datasets, using artificial intelligence (AI) and multi-scale modeling. In this perspective, we present an emerging research approach that is inspired by smart manufacturing (SM), to accelerate research, development, and scale-up of electrified chemical manufacturing processes. The value of this approach is demonstrated by its application toward the development of CO2 electrolyzers.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA