Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38739759

RESUMEN

Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.


Asunto(s)
Biología Computacional , Ácidos Nucleicos , Proteínas , Ácidos Nucleicos/metabolismo , Ácidos Nucleicos/química , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Ligandos , Unión Proteica , Humanos
2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38388682

RESUMEN

Proteins play an important role in life activities and are the basic units for performing functions. Accurately annotating functions to proteins is crucial for understanding the intricate mechanisms of life and developing effective treatments for complex diseases. Traditional biological experiments struggle to keep pace with the growing number of known proteins. With the development of high-throughput sequencing technology, a wide variety of biological data provides the possibility to accurately predict protein functions by computational methods. Consequently, many computational methods have been proposed. Due to the diversity of application scenarios, it is necessary to conduct a comprehensive evaluation of these computational methods to determine the suitability of each algorithm for specific cases. In this study, we present a comprehensive benchmark, BeProf, to process data and evaluate representative computational methods. We first collect the latest datasets and analyze the data characteristics. Then, we investigate and summarize 17 state-of-the-art computational methods. Finally, we propose a novel comprehensive evaluation metric, design eight application scenarios and evaluate the performance of existing methods on these scenarios. Based on the evaluation, we provide practical recommendations for different scenarios, enabling users to select the most suitable method for their specific needs. All of these servers can be obtained from https://csuligroup.com/BEPROF and https://github.com/CSUBioGroup/BEPROF.


Asunto(s)
Aprendizaje Profundo , Benchmarking , Proteínas , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento
3.
Nucleic Acids Res ; 51(5): e25, 2023 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-36629262

RESUMEN

The sequence-based predictors of RNA-binding residues (RBRs) are trained on either structure-annotated or disorder-annotated binding regions. A recent study of predictors of protein-binding residues shows that they are plagued by high levels of cross-predictions (protein binding residues are predicted as nucleic acid binding) and that structure-trained predictors perform poorly for the disorder-annotated regions and vice versa. Consequently, we analyze a representative set of the structure and disorder trained predictors of RBRs to comprehensively assess quality of their predictions. Our empirical analysis that relies on a new and low-similarity benchmark dataset reveals that the structure-trained predictors of RBRs perform well for the structure-annotated proteins while the disorder-trained predictors provide accurate results for the disorder-annotated proteins. However, these methods work only modestly well on the opposite types of annotations, motivating the need for new solutions. Using an empirical approach, we design HybridRNAbind meta-model that generates accurate predictions and low amounts of cross-predictions when tested on data that combines structure and disorder-annotated RBRs. We release this meta-model as a convenient webserver which is available at https://www.csuligroup.com/hybridRNAbind/.


Asunto(s)
Proteínas , Proteínas de Unión al ARN , ARN , Biología Computacional/métodos , Bases de Datos de Proteínas , Unión Proteica/genética , Proteínas/química , ARN/química , Proteínas de Unión al ARN/química
4.
J Am Chem Soc ; 2024 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-38934861

RESUMEN

The incorporation of three-dimensional structures into drug molecules has demonstrated significant improvements in clinical success. Late-stage saturation of drug molecules provides a direct pathway for this transformation. However, achieving selective and controllable reduction of aromatic rings remains challenging, particularly when multiple aromatic rings coexist. Herein, we present the switchable and chemoselective hydrogenation of benzene and pyridine rings. The utility of the protocol has been comprehensively investigated in diversified substrates with the assistance of a fragment-screening technique. This approach provides convenient access to a diverse array of cyclohexane and piperidine compounds, prevalent in various bioactive molecules and drugs. Furthermore, it discloses promising avenues for applications in the late-stage switchable saturation of drugs, facilitating an increase in the fraction of sp3-carbons which holds the potential to enhance the medicinal properties of drugs.

5.
J Am Chem Soc ; 146(17): 11866-11875, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38621677

RESUMEN

The available methods of chemical synthesis have arguably contributed to the prevalence of aromatic rings, such as benzene, toluene, xylene, or pyridine, in modern pharmaceuticals. Many such sp2-carbon-rich fragments are now easy to synthesize using high-quality cross-coupling reactions that click together an ever-expanding menu of commercially available building blocks, but the products are flat and lipophilic, decreasing their odds of becoming marketed drugs. Converting flat aromatic molecules into saturated analogues with a higher fraction of sp3 carbons could improve their medicinal properties and facilitate the invention of safe, efficacious, metabolically stable, and soluble medicines. In this study, we show that aromatic and heteroaromatic drugs can be readily saturated under exceptionally mild rhodium-catalyzed hydrogenation, acid-mediated reduction, or photocatalyzed-hydrogenation conditions, converting sp2 carbon atoms into sp3 carbon atoms and leading to saturated molecules with improved medicinal properties. These methods are productive in diverse pockets of chemical space, producing complex saturated pharmaceuticals bearing a variety of functional groups and three-dimensional architectures. The rhodium-catalyzed method tolerates traces of dimethyl sulfoxide (DMSO) or water, meaning that pharmaceutical compound collections, which are typically stored in wet DMSO, can finally be reformatted for use as substrates for chemical synthesis. This latter application is demonstrated through the late-stage saturation (LSS) of 768 complex and densely functionalized small-molecule drugs.


Asunto(s)
Rodio , Catálisis , Rodio/química , Preparaciones Farmacéuticas/química , Preparaciones Farmacéuticas/síntesis química , Hidrogenación , Estructura Molecular
6.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34498677

RESUMEN

Long non-coding RNAs (lncRNAs) are a class of RNA molecules with more than 200 nucleotides. A growing amount of evidence reveals that subcellular localization of lncRNAs can provide valuable insights into their biological functions. Existing computational methods for predicting lncRNA subcellular localization use k-mer features to encode lncRNA sequences. However, the sequence order information is lost by using only k-mer features. We proposed a deep learning framework, DeepLncLoc, to predict lncRNA subcellular localization. In DeepLncLoc, we introduced a new subsequence embedding method that keeps the order information of lncRNA sequences. The subsequence embedding method first divides a sequence into some consecutive subsequences and then extracts the patterns of each subsequence, last combines these patterns to obtain a complete representation of the lncRNA sequence. After that, a text convolutional neural network is employed to learn high-level features and perform the prediction task. Compared with traditional machine learning models, popular representation methods and existing predictors, DeepLncLoc achieved better performance, which shows that DeepLncLoc could effectively predict lncRNA subcellular localization. Our study not only presented a novel computational model for predicting lncRNA subcellular localization but also introduced a new subsequence embedding method which is expected to be applied in other sequence-based prediction tasks. The DeepLncLoc web server is freely accessible at http://bioinformatics.csu.edu.cn/DeepLncLoc/, and source code and datasets can be downloaded from https://github.com/CSUBioGroup/DeepLncLoc.


Asunto(s)
Aprendizaje Profundo , ARN Largo no Codificante , Biología Computacional/métodos , Redes Neurales de la Computación , ARN Largo no Codificante/genética , Programas Informáticos
7.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34905768

RESUMEN

Proteins with intrinsically disordered regions (IDRs) are common among eukaryotes. Many IDRs interact with nucleic acids and proteins. Annotation of these interactions is supported by computational predictors, but to date, only one tool that predicts interactions with nucleic acids was released, and recent assessments demonstrate that current predictors offer modest levels of accuracy. We have developed DeepDISOBind, an innovative deep multi-task architecture that accurately predicts deoxyribonucleic acid (DNA)-, ribonucleic acid (RNA)- and protein-binding IDRs from protein sequences. DeepDISOBind relies on an information-rich sequence profile that is processed by an innovative multi-task deep neural network, where subsequent layers are gradually specialized to predict interactions with specific partner types. The common input layer links to a layer that differentiates protein- and nucleic acid-binding, which further links to layers that discriminate between DNA and RNA interactions. Empirical tests show that this multi-task design provides statistically significant gains in predictive quality across the three partner types when compared to a single-task design and a representative selection of the existing methods that cover both disorder- and structure-trained tools. Analysis of the predictions on the human proteome reveals that DeepDISOBind predictions can be encoded into protein-level propensities that accurately predict DNA- and RNA-binding proteins and protein hubs. DeepDISOBind is available at https://www.csuligroup.com/DeepDISOBind/.


Asunto(s)
Proteínas de Unión al ADN/química , ADN/química , Aprendizaje Profundo , Proteínas Intrínsecamente Desordenadas/química , Proteínas de Unión al ARN/química , ARN/química , Biología Computacional/métodos , ADN/metabolismo , Proteínas de Unión al ADN/metabolismo , Humanos , Redes Neurales de la Computación , Ácidos Nucleicos/metabolismo , Unión Proteica , Proteoma/metabolismo , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo
8.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36458923

RESUMEN

MOTIVATION: Protein essentiality is usually accepted to be a conditional trait and strongly affected by cellular environments. However, existing computational methods often do not take such characteristics into account, preferring to incorporate all available data and train a general model for all cell lines. In addition, the lack of model interpretability limits further exploration and analysis of essential protein predictions. RESULTS: In this study, we proposed DeepCellEss, a sequence-based interpretable deep learning framework for cell line-specific essential protein predictions. DeepCellEss utilizes a convolutional neural network and bidirectional long short-term memory to learn short- and long-range latent information from protein sequences. Further, a multi-head self-attention mechanism is used to provide residue-level model interpretability. For model construction, we collected extremely large-scale benchmark datasets across 323 cell lines. Extensive computational experiments demonstrate that DeepCellEss yields effective prediction performance for different cell lines and outperforms existing sequence-based methods as well as network-based centrality measures. Finally, we conducted some case studies to illustrate the necessity of considering specific cell lines and the superiority of DeepCellEss. We believe that DeepCellEss can serve as a useful tool for predicting essential proteins across different cell lines. AVAILABILITY AND IMPLEMENTATION: The DeepCellEss web server is available at http://csuligroup.com:8000/DeepCellEss. The source code and data underlying this study can be obtained from https://github.com/CSUBioGroup/DeepCellEss. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Proteínas/metabolismo , Secuencia de Aminoácidos , Programas Informáticos , Línea Celular , Biología Computacional/métodos
9.
Opt Express ; 32(12): 21007-21016, 2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38859466

RESUMEN

Finding suitable fiber amplifiers is one of the key strategies to increase the transmission capacity of fiber links. Recently, bismuth-doped fiber amplifiers (BDFAs) have attracted much attention due to their distinctive ultra-wideband luminescence properties. In this paper, we propose a linear cavity double pass structure for BDFA operating in the O and E bands. The design creates a linear cavity within the amplifier by combining a fiber Bragg grating (FBG) and a fiber mirror to achieve dual-wavelength pump at 1240 nm and 1310 nm. Meanwhile, the configuration of a circulator and mirror facilitates bidirectional signal propagation through the BDFA, resulting in a double-pass amplification structure. We have tested and analyzed the performance of the linear cavity double pass structure BDFA under different pump schemes and compared it with the conventional structure BDFA. The results show that the gain spectrum of the new structure is shifted toward longer wavelengths, and the gain band is extended from the O band to the O and E bands compared with the conventional structure. In particular, the linear cavity double pass structure BDFA has more relaxed requirements on the stability of the pump and signal power. This work provides a positive reference for the design, application, and development of BDFAs.

10.
Org Biomol Chem ; 22(14): 2851-2862, 2024 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-38516867

RESUMEN

Hypochlorous acid (HOCl) released from activated leukocytes plays a significant role in the human immune system, but is also implicated in numerous diseases due to its inappropriate production. Chlorinated nucleobases induce genetic changes that potentially enable and stimulate carcinogenesis, and thus have attracted considerable attention. However, their multiple halogenation sites pose challenges to identify them. As a good complement to experiments, quantum chemical computation was used to uncover chlorination sites and chlorinated products in this study. The results indicate that anion salt forms of all purine compounds play significant roles in chlorination except for adenosine. The kinetic reactivity order of all reaction sites in terms of the estimated apparent rate constant kobs-est (in M-1 s-1) is heterocyclic NH/N (102-107) > exocyclic NH2 (10-2-10) > heterocyclic C8 (10-5-10-1), but the order is reversed for thermodynamics. Combining kinetics and thermodynamics, the numerical simulation results show that N9 is the most reactive site for purine bases to form the main initial chlorinated product, while for purine nucleosides N1 and exocyclic N2/N6 are the most reactive sites to produce the main products controlled by kinetics and thermodynamics, respectively, and C8 is a possible site to generate the minor product. The formation mechanisms of biomarker 8-Cl- and 8-oxo-purine derivatives were also investigated. Additionally, the structure-kinetic reactivity relationship study reveals a good correlation between lg kobs-est and APT charge in all purine compounds compared to FED2 (HOMO), which proves again that the electrostatic interaction plays a key role. The results are helpful to further understand the reactivity of various reaction sites in aromatic compounds during chlorination.


Asunto(s)
Nucleósidos , Contaminantes Químicos del Agua , Humanos , Nucleósidos/química , Halogenación , Dominio Catalítico , Nucleósidos de Purina , Ácido Hipocloroso/química , Cinética , Cloro/química , Contaminantes Químicos del Agua/química
11.
J Am Chem Soc ; 145(29): 15695-15701, 2023 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-37435957

RESUMEN

The highly enantioselective and complete hydrogenation of protected indoles and benzofurans has been developed, affording facile access to a range of chiral three-dimensional octahydroindoles and octahydrobenzofurans, which are prevalent in many bioactive molecules and organocatalysts. Remarkably, we are in control of the nature of the ruthenium N-heterocyclic carbene complex and employed the complex as both homogeneous and heterogeneous catalysts, providing new avenues for its potential applications in the asymmetric hydrogenation of more challenging aromatic compounds.

12.
Angew Chem Int Ed Engl ; 61(24): e202203212, 2022 06 13.
Artículo en Inglés | MEDLINE | ID: mdl-35357071

RESUMEN

A phosphine-catalyzed highly enantioselective and diastereoselective (up to 98 % ee and >20 : 1 dr) (3+2) annulation between vinylcyclopropanes and N-tosylaldimines has been developed, which allows facile access to a range of highly functionalized chiral pyrrolidines. Notably, this method makes use of vinylcyclopropanes as a synthon for phosphine-mediated asymmetric annulation reaction, which will offer new opportunities for potential applications of cyclopropanes substrates in phosphine-catalyzed organic transformations.


Asunto(s)
Iminas , Pirrolidinas , Catálisis , Fosfinas , Estereoisomerismo
13.
Angew Chem Int Ed Engl ; 61(47): e202209494, 2022 11 21.
Artículo en Inglés | MEDLINE | ID: mdl-36200408

RESUMEN

A sequential phosphine-catalyzed asymmetric [3+2] annulation of aldimines with allenoates and oxidative central-to-axial chirality transfer strategy has been developed. This approach is operationally simple, allowing for rapid access to a range of axially chiral CF3 -containing 2-arylpyrroles with high enantiomeric excess. Furthermore, an atroposelective synthesis of esaxerenone is presented, illustrating the practical potential of the reported method.


Asunto(s)
Fosfinas , Catálisis , Estereoisomerismo , Estrés Oxidativo
14.
Bioinformatics ; 36(Suppl_2): i735-i744, 2020 12 30.
Artículo en Inglés | MEDLINE | ID: mdl-33381815

RESUMEN

MOTIVATION: Knowledge of protein-binding residues (PBRs) improves our understanding of protein-protein interactions, contributes to the prediction of protein functions and facilitates protein-protein docking calculations. While many sequence-based predictors of PBRs were published, they offer modest levels of predictive performance and most of them cross-predict residues that interact with other partners. One unexplored option to improve the predictive quality is to design consensus predictors that combine results produced by multiple methods. RESULTS: We empirically investigate predictive performance of a representative set of nine predictors of PBRs. We report substantial differences in predictive quality when these methods are used to predict individual proteins, which contrast with the dataset-level benchmarks that are currently used to assess and compare these methods. Our analysis provides new insights for the cross-prediction concern, dissects complementarity between predictors and demonstrates that predictive performance of the top methods depends on unique characteristics of the input protein sequence. Using these insights, we developed PROBselect, first-of-its-kind consensus predictor of PBRs. Our design is based on the dynamic predictor selection at the protein level, where the selection relies on regression-based models that accurately estimate predictive performance of selected predictors directly from the sequence. Empirical assessment using a low-similarity test dataset shows that PROBselect provides significantly improved predictive quality when compared with the current predictors and conventional consensuses that combine residue-level predictions. Moreover, PROBselect informs the users about the expected predictive quality for the prediction generated from a given input protein. AVAILABILITY AND IMPLEMENTATION: PROBselect is available at http://bioinformatics.csu.edu.cn/PROBselect/home/index. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Proteínas , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Unión Proteica , Proteínas/metabolismo
15.
Bioinformatics ; 36(4): 1114-1120, 2020 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-31593229

RESUMEN

MOTIVATION: Protein-protein interactions (PPIs) play important roles in many biological processes. Conventional biological experiments for identifying PPI sites are costly and time-consuming. Thus, many computational approaches have been proposed to predict PPI sites. Existing computational methods usually use local contextual features to predict PPI sites. Actually, global features of protein sequences are critical for PPI site prediction. RESULTS: A new end-to-end deep learning framework, named DeepPPISP, through combining local contextual and global sequence features, is proposed for PPI site prediction. For local contextual features, we use a sliding window to capture features of neighbors of a target amino acid as in previous studies. For global sequence features, a text convolutional neural network is applied to extract features from the whole protein sequence. Then the local contextual and global sequence features are combined to predict PPI sites. By integrating local contextual and global sequence features, DeepPPISP achieves the state-of-the-art performance, which is better than the other competing methods. In order to investigate if global sequence features are helpful in our deep learning model, we remove or change some components in DeepPPISP. Detailed analyses show that global sequence features play important roles in DeepPPISP. AVAILABILITY AND IMPLEMENTATION: The DeepPPISP web server is available at http://bioinformatics.csu.edu.cn/PPISP/. The source code can be obtained from https://github.com/CSUBioGroup/DeepPPISP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Neurales de la Computación , Secuencia de Aminoácidos , Dominios y Motivos de Interacción de Proteínas , Proteínas , Programas Informáticos
16.
Opt Express ; 29(15): 23682-23700, 2021 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-34614629

RESUMEN

Classic algebraic reconstruction technique (ART) for computed tomography requires pre-determined weights of the voxels for the projected pixel values to build the equations. However, such weights cannot be accurately obtained in the application of chemiluminescence measurements due to the high physical complexity and computation resources required. Moreover, streaks arise in the results from ART method especially with imperfect projections. In this study, we propose a semi-case-wise learning-based method named Weight Encode Reconstruction Network (WERNet) to co-learn the target phantom intensities and the adaptive weight matrix of the case without labeling the target voxel set and thus offers a more applicable solution for computed tomography problems. Both numerical and experimental validations were conducted to evaluate the algorithm. In the numerical test, with the help of gradient normalization, the WERNet reconstructed voxel set with a high accuracy and showed a higher capability of denoising compared to the classic ART methods. In the experimental test, WERNet produces comparable results to the ART method while having a better performance in avoiding the streaks. Furthermore, with the adaptive weight matrix, WERNet is not sensitive to the ensemble intensity of the projection which shows much better robustness than ART method.

17.
Methods ; 179: 73-80, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32387314

RESUMEN

In recent years, accumulating studies have shown that long non-coding RNAs (lncRNAs) not only play an important role in the regulation of various biological processes but also are the foundation for understanding mechanisms of human diseases. Due to the high cost of traditional biological experiments, the number of experimentally verified lncRNA-disease associations is very limited. Thus, many computational approaches have been proposed to discover the underlying associations between lncRNAs and diseases. However, the associations between lncRNAs and diseases are too complicated to model by using only traditional matrix factorization-based methods. In this study, we propose a hybrid computational framework (SDLDA) for the lncRNA-disease association prediction. In our computational framework, we use singular value decomposition and deep learning to extract linear and non-linear features of lncRNAs and diseases, respectively. Then we train SDLDA by combing the linear and non-linear features. Compared to previous computational methods, the combination of linear and non-linear features reinforces each other, which is better than using only either matrix factorization or deep learning. The computational results show that SDLDA has a better performance over existing methods in the leave-one-out cross-validation. Furthermore, the case studies show that 28 out of 30 cancer-related lncRNAs (10 for gastric cancer, 10 for colon cancer and 8 for renal cancer) are verified by mining recent biomedical literature. Code and data can be accessed at https://github.com/CSUBioGroup/SDLDA.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Estudios de Asociación Genética/métodos , ARN Largo no Codificante/metabolismo , Minería de Datos/métodos , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Regulación de la Expresión Génica , Predisposición Genética a la Enfermedad , Humanos , Neoplasias/genética , ARN Largo no Codificante/genética
18.
Appl Opt ; 60(22): 6469-6478, 2021 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-34612882

RESUMEN

Classic algorithms for computed tomography of chemiluminescence include two main steps: tomographic weight matrix calculation using imaging models, and inverse calculation using algebraic reconstruction techniques (ARTs). However, pre-calculated weight matrices require a large amount of storage, and accurate voxel weights may not be obtained using a simplified imaging model. In this study, we propose a new, to the best of our knowledge, method named the multi-weight encode reconstruction network (Multi-WERNet) to learn the implicit light propagation physics from the multi-projections of different flames and simultaneously reconstruct the 3D flame chemiluminescence. The reconstructed results from Multi-WERNet are close to those of ART, and no radial streak is found, which is commonly seen in ART-based methods. With the help of information from different flames, the results reconstructed with 5 views using Multi-WERNet outperform the ART method. Moreover, Multi-WERNet successfully learns the implicit light propagation physics as a voxel weight encoder and can be transferred to unseen cases. Finally, Multi-WERNet is found to have higher robustness than ART in reconstruction with imperfect projections, which makes the algorithm more practical.

19.
Proteomics ; 19(12): e1900019, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30941889

RESUMEN

Annotation of protein functions plays an important role in understanding life at the molecular level. High-throughput sequencing produces massive numbers of raw proteins sequences and only about 1% of them have been manually annotated with functions. Experimental annotations of functions are expensive, time-consuming and do not keep up with the rapid growth of the sequence numbers. This motivates the development of computational approaches that predict protein functions. A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence- and network-derived information. More precisely, DeepFunc uses a long and sparse binary vector to encode information concerning domains, families, and motifs collected from the InterPro tool that is associated with the input protein sequence. This vector is processed with two neural layers to obtain a low-dimensional vector which is combined with topological information extracted from protein-protein interactions (PPIs) and functional linkages. The combined information is processed by a deep neural network that predicts protein functions. DeepFunc is empirically and comparatively tested on a benchmark testing dataset and the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset. The experimental results demonstrate that DeepFunc outperforms current methods on the testing dataset and that it secures the highest Fmax  = 0.54 and AUC = 0.94 on the CAFA3 dataset.


Asunto(s)
Secuencia de Aminoácidos/genética , Anotación de Secuencia Molecular , Proteínas/genética , Algoritmos , Biología Computacional , Aprendizaje Profundo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Redes Neurales de la Computación
20.
Angew Chem Int Ed Engl ; 58(19): 6260-6264, 2019 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-30746821

RESUMEN

Construction of contiguous all-carbon quaternary stereogenic centers is a long-standing challenge in synthetic organic chemistry. In this report, a phosphine-catalyzed enantioselective (3+2) annulation reaction between allenes and isoindigos, containing either two identical or different oxindole moieties, is introduced as a powerful strategy for the construction of spirocyclic bisindoline alkaloid core structures. The reported reactions feature high chemical yields, excellent enantioselectivities, and very good regioselectivities, and are highly useful for creating structurally challenging bisindoline natural products.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA