RESUMO
In this work, we established, validated, and optimized a novel computational framework for tracing arbitrarily oriented actin filaments in cryo-electron tomography maps. Our approach was designed for highly complex intracellular architectures in which a long-range cytoskeleton network extends throughout the cell bodies and protrusions. The irregular organization of the actin network, as well as cryo-electron-tomography-specific noise, missing wedge artifacts, and map dimensions call for a specialized implementation that is both robust and efficient. Our proposed solution, Struwwel Tracer, accumulates densities along paths of a specific length in various directions, starting from locally determined seed points. The highest-density paths originating from the seed points form short linear candidate filament segments, which are further scrutinized and classified by users via inspection of a novel pruning map, which visualizes the likelihood of being a part of longer filaments. The pruned linear candidate filament segments are then iteratively fused into continuous, longer, and curved filaments based on their relative orientations, gap spacings, and extendibility. When applied to the simulated phantom tomograms of a Dictyostelium discoideum filopodium under experimental conditions, Struwwel Tracer demonstrated high efficacy, with F1-scores ranging from 0.85 to 0.90, depending on the noise level. Furthermore, when applied to a previously untraced experimental tomogram of mouse fibroblast lamellipodia, the filaments predicted by Struwwel Tracer exhibited a good visual agreement with the experimental map. The Struwwel Tracer framework is highly time efficient and can complete the tracing process in just a few minutes. The source code is publicly available with version 3.2 of the free and open-source Situs software package.
Assuntos
Dictyostelium , Camundongos , Animais , Citoesqueleto de Actina , Citoesqueleto , Actinas , Tomografia com Microscopia Eletrônica/métodosRESUMO
Electron cryo-tomography allows for high-resolution imaging of stereocilia in their native state. Because their actin filaments have a higher degree of order, we imaged stereocilia from mice lacking the actin crosslinker plastin 1 (PLS1). We found that while stereocilia actin filaments run 13 nm apart in parallel for long distances, there were gaps of significant size that were stochastically distributed throughout the actin core. Actin crosslinkers were distributed through the stereocilium, but did not occupy all possible binding sites. At stereocilia tips, protein density extended beyond actin filaments, especially on the side of the tip where a tip link is expected to anchor. Along the shaft, repeating density was observed that corresponds to actin-to-membrane connectors. In the taper region, most actin filaments terminated near the plasma membrane. The remaining filaments twisted together to make a tighter bundle than was present in the shaft region; the spacing between them decreased from 13 nm to 9 nm, and the apparent filament diameter decreased from 6.4 to 4.8 nm. Our models illustrate detailed features of distinct structural domains that are present within the stereocilium.
Assuntos
Citoesqueleto de Actina/metabolismo , Actinas/metabolismo , Tomografia com Microscopia Eletrônica/métodos , Células Ciliadas Vestibulares/metabolismo , Glicoproteínas de Membrana/metabolismo , Proteínas dos Microfilamentos/metabolismo , Citoesqueleto de Actina/genética , Animais , Glicoproteínas de Membrana/genética , Camundongos , Proteínas dos Microfilamentos/genéticaRESUMO
Cryo-electron microscopy (cryo-EM) density maps at medium resolution (5-10 Å) reveal secondary structural features such as α-helices and ß-sheets, but they lack the side chain details that would enable a direct structure determination. Among the more than 800 entries in the Electron Microscopy Data Bank (EMDB) of medium-resolution density maps that are associated with atomic models, a wide variety of similarities can be observed between maps and models. To validate such atomic models and to classify structural features, a local similarity criterion, the F1 score, is proposed and evaluated in this study. The F1 score is theoretically normalized to a range from zero to one, providing a local measure of cylindrical agreement between the density and atomic model of a helix. A systematic scan of 30,994 helices (among 3,247 protein chains modeled into medium-resolution density maps) reveals an actual range of observed F1 scores from 0.171 to 0.848, suggesting that the cylindrical fit of the current data is well stratified by the proposed measure. The best (highest) F1 scores tend to be associated with regions that exhibit high and spatially homogeneous local resolution (between 5 Å and 7.5 Å) in the helical density. The proposed F1 scores can be used as a discriminative classifier for validation studies and as a ranking criterion for cryo-EM density features in databases.
Assuntos
Proteínas , Microscopia Crioeletrônica , Modelos Moleculares , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha betaRESUMO
Cryo-electron microscopy (cryo-EM) is a structure determination method for large molecular complexes. As more and more atomic structures are determined using this technique, it is becoming possible to perform statistical characterization of side-chain conformations. Two data sets were involved to characterize block lengths for each of the 18 types of amino acids. One set contains 9131 structures resolved using X-ray crystallography from density maps with better than or equal to 1.5 Å resolutions, and the other contains 237 protein structures derived from cryo-EM density maps with 2-4 Å resolutions. The results show that the normalized probability density function of block lengths is similar between the X-ray data set and the cryo-EM data set for most of the residue types, but differences were observed for ARG, GLU, ILE, LYS, PHE, TRP, and TYR for which conformations with certain shorter block lengths are more likely to be observed in the cryo-EM set with 2-4 Å resolutions.
Assuntos
Microscopia Crioeletrônica , Cristalografia por Raios X , Modelos Moleculares , Microscopia Crioeletrônica/métodos , Cristalografia por Raios X/métodos , Conformação Proteica , Proteínas/químicaRESUMO
Cryo-electron tomography (cryo-ET) is a powerful method of visualizing the three-dimensional organization of supramolecular complexes, such as the cytoskeleton, in their native cell and tissue contexts. Due to its minimal electron dose and reconstruction artifacts arising from the missing wedge during data collection, cryo-ET typically results in noisy density maps that display anisotropic XY versus Z resolution. Molecular crowding further exacerbates the challenge of automatically detecting supramolecular complexes, such as the actin bundle in hair cell stereocilia. Stereocilia are pivotal to the mechanoelectrical transduction process in inner ear sensory epithelial hair cells. Given the complexity and dense arrangement of actin bundles, traditional approaches to filament detection and tracing have failed in these cases. In this study, we introduce BundleTrac, an effective method to trace hundreds of filaments in a bundle. A comparison between BundleTrac and manually tracing the actin filaments in a stereocilium showed that BundleTrac accurately built 326 of 330 filaments (98.8%), with an overall cross-distance of 1.3 voxels for the 330 filaments. BundleTrac is an effective semi-automatic modeling approach in which a seed point is provided for each filament and the rest of the filament is computationally identified. We also demonstrate the potential of a denoising method that uses a polynomial regression to address the resolution and high-noise anisotropic environment of the density map.
Assuntos
Citoesqueleto de Actina/ultraestrutura , Estereocílios/ultraestrutura , Algoritmos , Animais , Tomografia com Microscopia Eletrônica , Humanos , Análise de Regressão , Estereocílios/metabolismoRESUMO
While the acquisition of cryo-electron microscopy (cryo-EM) at near-atomic resolution is becoming more prevalent, a considerable number of density maps are still resolved only at intermediate resolutions (5-10 Å). Due to the large variation in quality among these medium-resolution density maps, extracting structural information from them remains a challenging task. This study introduces a convolutional neural network (CNN)-based framework, cryoSSESeg, to determine the organization of protein secondary structure elements in medium-resolution cryo-EM images. CryoSSESeg is trained on approximately 1300 protein chains derived from around 500 experimental cryo-EM density maps of varied quality. It demonstrates strong performance with residue-level F 1 scores of 0.76 for helix detection and 0.60 for ß-sheet detection on average across a set of testing chains. In comparison to traditional image processing tools like SSETracer, which demand significant manual intervention and preprocessing steps, cryoSSESeg demonstrates comparable or superior performance. Additionally, it demonstrates competitive performance alongside another deep learning-based model, Emap2sec. Furthermore, this study underscores the importance of secondary structure quality, particularly adherence to expected shapes, in detection performance, emphasizing the necessity for careful evaluation of the data quality.
RESUMO
Within cells, cytoskeletal filaments are often arranged into loosely aligned bundles. These fibrous bundles are dense enough to exhibit a certain regularity and mean direction, however, their packing is not sufficient to impose a symmetry between-or specific shape on-individual filaments. This intermediate regularity is computationally difficult to handle because individual filaments have a certain directional freedom, however, the filament densities are not well segmented from each other (especially in the presence of noise, such as in cryo-electron tomography). In this paper, we develop a dynamic programming-based framework, Spaghetti Tracer, to characterizing the structural arrangement of filaments in the challenging 3D maps of subcellular components. Assuming that the tomogram can be rotated such that the filaments are oriented in a mean direction, the proposed framework first identifies local seed points for candidate filament segments, which are then grown from the seeds using a dynamic programming algorithm. We validate various algorithmic variations of our framework on simulated tomograms that closely mimic the noise and appearance of experimental maps. As we know the ground truth in the simulated tomograms, the statistical analysis consisting of precision, recall, and F1 scores allows us to optimize the performance of this new approach. We find that a bipyramidal accumulation scheme for path density is superior to straight-line accumulation. In addition, the multiplication of forward and backward path densities provides for an efficient filter that lifts the filament density above the noise level. Resulting from our tests is a robust method that can be expected to perform well (F1 scores 0.86-0.95) under experimental noise conditions.
Assuntos
Algoritmos , Tomografia com Microscopia Eletrônica , Citoesqueleto , Tomografia com Microscopia Eletrônica/métodosRESUMO
Bengali is a low-resource language that lacks tools and resources for various natural language processing (NLP) tasks, such as sentiment analysis or profanity identification. In Bengali, only the translated versions of English sentiment lexicons are available. Moreover, no dictionary exists for detecting profanity in Bengali social media text. This study introduces a Bengali sentiment lexicon, BengSentiLex, and a Bengali swear lexicon, BengSwearLex. For creating BengSentiLex, a cross-lingual methodology is proposed that utilizes a machine translation system, a review corpus, two English sentiment lexicons, pointwise mutual information (PMI), and supervised machine learning (ML) classifiers in various stages. A semi-automatic methodology is presented to develop BengSwearLex that leverages an obscene corpus, word embedding, and part-of-speech (POS) taggers. The performance of BengSentiLex compared with the translated English lexicons in three evaluation datasets. BengSentiLex achieves 5%-50% improvement over the translated lexicons. For identifying profanity, BengSwearLex achieves documentlevel coverage of around 85% in an document-level in the evaluation dataset. The experimental results imply that BengSentiLex and BengSwearLex are effective resources for classifying sentiment and identifying profanity in Bengali social media content, respectively.
RESUMO
The presence of abusive and vulgar language in social media has become an issue of increasing concern in recent years. However, research pertaining to the prevalence and identification of vulgar language has remained largely unexplored in low-resource languages such as Bengali. In this paper, we provide the first comprehensive analysis on the presence of vulgarity in Bengali social media content. We develop two benchmark corpora consisting of 7,245 reviews collected from YouTube and manually annotate them into vulgar and non-vulgar categories. The manual annotation reveals the ubiquity of vulgar and swear words in Bengali social media content (i.e., in two corpora), ranging from 20% to 34%. To automatically identify vulgarity, we employ various approaches, such as classical machine learning (CML) classifiers, Stochastic Gradient Descent (SGD) optimizer, a deep learning (DL) based architecture, and lexicon-based methods. Although small in size, we find that the swear/vulgar lexicon is effective at identifying the vulgar language due to the high presence of some swear terms in Bengali social media. We observe that the performances of machine leanings (ML) classifiers are affected by the class distribution of the dataset. The DL-based BiLSTM (Bidirectional Long Short Term Memory) model yields the highest recall scores for identifying vulgarity in both datasets (i.e., in both original and class-balanced settings). Besides, the analysis reveals that vulgarity is highly correlated with negative sentiment in social media comments.
RESUMO
As automated filament tracing algorithms in cryo-electron tomography (cryo-ET) continue to improve, the validation of these approaches has become more incumbent. Having a known ground truth on which to base predictions is crucial to reliably test predicted cytoskeletal filaments because the detailed structure of the filaments in experimental tomograms is obscured by a low resolution, as well as by noise and missing Fourier space wedge artifacts. We present a software tool for the realistic simulation of tomographic maps (TomoSim) based on a known filament trace. The parameters of the simulated map are automatically matched to those of a corresponding experimental map. We describe the computational details of the first prototype of our approach, which includes wedge masking in Fourier space, noise color, and signal-to-noise matching. We also discuss current and potential future applications of the approach in the validation of concurrent filament tracing methods in cryo-ET.
RESUMO
We propose a fast, dynamic programming-based framework for tracing actin filaments in 3D maps of subcellular components in cryo-electron tomography. The approach can identify high-density filament segments in various orientations, but it takes advantage of the arrangement of actin filaments within cells into more or less tightly aligned bundles. Assuming that the tomogram can be rotated such that the filaments can be oriented to be directed in a dominant direction (i.e., the X, Y, or Z axis), the proposed framework first identifies local seed points that form the origin of candidate filament segments (CFSs), which are then grown from the seeds using a fast dynamic programming algorithm. The CFS length l can be tuned to the nominal resolution of the tomogram or the separation of desired features, or it can be used to restrict the curvature of filaments that deviate from the overall bundle direction. In subsequent steps, the CFSs are filtered based on backward tracing and path density analysis. Finally, neighboring CFSs are fused based on a collinearity criterion to bridge any noise artifacts in the 3D map that would otherwise fractionalize the tracing. We validate our proposed framework on simulated tomograms that closely mimic the features and appearance of experimental maps.
RESUMO
Although cryo-electron microscopy (cryo-EM) has been successfully used to derive atomic structures for many proteins, it is still challenging to derive atomic structures when the resolution of cryo-EM density maps is in the medium resolution range, such as 5-10 Å. Detection of protein secondary structures, such as helices and ß-sheets, from cryo-EM density maps provides constraints for deriving atomic structures from such maps. As more deep learning methodologies are being developed for solving various molecular problems, effective tools are needed for users to access them. We have developed an effective software bundle, DeepSSETracer, for the detection of protein secondary structure from cryo-EM component maps in medium resolution. The bundle contains the network architecture and a U-Net model trained with a curriculum and gradient of episodic memory (GEM). The bundle integrates the deep neural network with the visualization capacity provided in ChimeraX. Using a Linux server that is remotely accessed by Windows users, it takes about 6 s on one CPU and one GPU for the trained deep neural network to detect secondary structures in a cryo-EM component map containing 446 amino acids. A test using 28 chain components of cryo-EM maps shows overall residue-level F1 scores of 0.72 and 0.65 to detect helices and ß-sheets, respectively. Although deep learning applications are built on software frameworks, such as PyTorch and Tensorflow, our pioneer work here shows that integration of deep learning applications with ChimeraX is a promising and effective approach. Our experiments show that the F1 score measured at the residue level is an effective evaluation of secondary structure detection for individual classes. The test using 28 cryo-EM component maps shows that DeepSSETracer detects ß-sheets more accurately than Emap2sec+, with a weighted average residue-level F1 score of 0.65 and 0.42, respectively. It also shows that Emap2sec+ detects helices more accurately than DeepSSETracer with a weighted average residue-level F1 score of 0.77 and 0.72 respectively.
RESUMO
Although Cryo-electron microscopy (cryo-EM) has been successfully used to derive atomic structures for many proteins, it is still challenging to derive atomic structure when the resolution of cryo-EM density maps is in the medium range, e.g., 5-10 Å. Studies have attempted to utilize machine learning methods, especially deep neural networks to build predictive models for the detection of protein secondary structures from cryo-EM images, which ultimately helps to derive the atomic structure of proteins. However, the large variation in data quality makes it challenging to train a deep neural network with high prediction accuracy. Curriculum learning has been shown as an effective learning paradigm in machine learning. In this paper, we present a study using curriculum learning as a more effective way to utilize cryo-EM density maps with varying quality. We investigated three distinct training curricula that differ in whether/how images used for training in past are reused while the network was continually trained using new images. A total of 1,382 3-dimensional cryo-EM images were extracted from density maps of Electron Microscopy Data Bank in our study. Our results indicate learning with curriculum significantly improves the performance of the final trained network when the forgetting problem is properly addressed.
RESUMO
Cryo-electron microscopy (Cryo-EM) and cryo-electron tomography (cryo-ET) produce 3-D density maps of biological molecules at a range of resolution levels. Pattern recognition tools are important in distinguishing biological components from volumetric maps with the available resolutions. One of the most distinct characters in density maps at medium (5-10 Å) resolution is the visibility of protein secondary structures. Although computational methods have been developed, the accurate detection of helices and ß-strands from cryo-EM density maps is still an active research area. We have developed a tool for protein secondary structure detection and evaluation of medium resolution 3-D cryo-EM density maps which combines three computational methods (SSETracer, StrandTwister, and AxisComparison). The program was integrated in UCSF Chimera, a popular visualization software in the cryo-EM community. In related work, we have developed BundleTrac, a computational method to trace filaments in a bundle from lower resolution cryo-ET density maps. It has been applied to actin filament tracing in stereocilia with good accuracy and can be potentially added as a tool in Chimera.