Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 266
Filtrar
1.
Ecotoxicol Environ Saf ; 185: 109733, 2019 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-31580980

RESUMO

Presence of missing data points in datasets is among main challenges in handling the toxicological data for nanomaterials. As the processing of missing data is an important part of data analysis, we have introduced a read-across approach that uses a combination of supervised and unsupervised machine learning techniques to fill the missing values. A series of classification models (supervised learning) was developed to predict class label, and self-organizing map approach (unsupervised learning) was used to estimate relative distances between nanoparticles and refine results obtained during supervised learning. In this study, genotoxicity of 49 silicon and metal oxide nanoparticles in Ames and Comet tests. Collected literature data did not demonstrate significant variations related to the change of size including selected bulk materials. Genotoxicity-related features of nanomaterials were represented by ionic characteristics. General tendencies found in the current study were convincingly linked to known theories of genotoxic action at nano-level. Mechanisms of primary and secondary genotoxic effects were discussed in the context of developed models.


Assuntos
Dano ao DNA , Nanopartículas Metálicas/toxicidade , Modelos Teóricos , Mutagênicos/toxicidade , Aprendizado de Máquina não Supervisionado , Linhagem Celular , Ensaio Cometa , Humanos , Nanopartículas Metálicas/classificação , Mutagênicos/classificação , Óxidos/classificação , Óxidos/toxicidade , Relação Quantitativa Estrutura-Atividade , Salmonella typhimurium/genética
2.
PLoS Comput Biol ; 15(9): e1007348, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31479439

RESUMO

Cellular microscopy images contain rich insights about biology. To extract this information, researchers use features, or measurements of the patterns of interest in the images. Here, we introduce a convolutional neural network (CNN) to automatically design features for fluorescence microscopy. We use a self-supervised method to learn feature representations of single cells in microscopy images without labelled training data. We train CNNs on a simple task that leverages the inherent structure of microscopy images and controls for variation in cell morphology and imaging: given one cell from an image, the CNN is asked to predict the fluorescence pattern in a second different cell from the same image. We show that our method learns high-quality features that describe protein expression patterns in single cells both yeast and human microscopy datasets. Moreover, we demonstrate that our features are useful for exploratory biological analysis, by capturing high-resolution cellular components in a proteome-wide cluster analysis of human proteins, and by quantifying multi-localized proteins and single-cell variability. We believe paired cell inpainting is a generalizable method to obtain feature representations of single cells in multichannel microscopy images.


Assuntos
Microscopia/métodos , Análise de Célula Única/métodos , Aprendizado de Máquina não Supervisionado , Células Cultivadas , Biologia Computacional , Humanos , Processamento de Imagem Assistida por Computador/métodos , Leveduras/citologia
3.
J Chem Theory Comput ; 15(11): 6343-6357, 2019 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-31476122

RESUMO

Phase separation in mixed lipid systems has been extensively studied both experimentally and theoretically because of its biological importance. A detailed description of such complex systems undoubtedly requires novel mathematical frameworks that are capable of decomposing and categorizing the evolution of thousands if not millions of lipids involved in the phenomenon. The interpretation and analysis of molecular dynamics (MD) simulations representing temporal and spatial changes in such systems are still a challenging task. Here, we present an unsupervised machine learning approach based on nonnegative matrix factorization called NMFk that successfully extracts latent (i.e., not directly observable) features from the second layer neighborhood profiles derived from coarse-grained MD simulations of a ternary lipid mixture. Our results demonstrate that NMFk extracts physically meaningful features that uniquely describe the phase separation such as locations and roles of different lipid types, formation of nanodomains, and timescales of lipid segregation.


Assuntos
Lipídeos/química , Aprendizado de Máquina não Supervisionado , 1,2-Dipalmitoilfosfatidilcolina/química , Colesterol/química , Bicamadas Lipídicas/química , Simulação de Dinâmica Molecular , Fosfatidilcolinas/química
4.
BMC Genomics ; 20(1): 638, 2019 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-31395005

RESUMO

BACKGROUND: Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation. RESULTS: HetEnc includes both an unsupervised feature representation module and a supervised neural network module to handle multi-platform gene expression datasets. It first constructs three different encoding networks to represent the original gene expression data using high-level abstracted features. A six-layer fully-connected feed-forward neural network is then trained using these abstracted features for each targeted endpoint. We applied HetEnc to the SEQC neuroblastoma dataset to demonstrate that it outperforms other machine learning approaches. Although we used multi-platform data in feature abstraction and model training, HetEnc does not need multi-platform data for prediction, enabling a broader application of the trained model by reducing the cost of gene expression profiling for new samples to a single platform. Thus, HetEnc provides a new solution to integrated gene expression analysis, accelerating modern biological research.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Bases de Dados Factuais , Humanos , Modelos Estatísticos , Neuroblastoma/genética , Transcriptoma , Aprendizado de Máquina não Supervisionado
5.
Stud Health Technol Inform ; 264: 1464-1465, 2019 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-31438183

RESUMO

In the 5P medicine (Personalized, Preventive, Participative, Predictive and Pluri-expert), the general trend is to process data by displacing the barycenter of the information from hospital centered systems to the patient centered ones through his personal medical records. Today, the use of artificial intelligence for supporting this transition shows real limitations in its implementation in operational practice, both at the level of patient care, but also in the general daily life of the health professional, because of the medico-legal imperatives induced by the promises of the '5P medicine'. In this paper, we propose to fill this gap by introducing an original artificial intelligence platform, named Maxwell, which follows an unsupervised learning approach in line with the medico-legal imperatives of the '5P medicine'. We describe the functional platform characteristics and illustrate them by two examples of clustering in genomics and magnetic resonance imaging.


Assuntos
Medicina , Aprendizado de Máquina não Supervisionado , Inteligência Artificial , Genômica , Humanos , Imagem por Ressonância Magnética
6.
EMBO J ; 38(18): e100811, 2019 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-31436334

RESUMO

The retina is a specialized neural tissue that senses light and initiates image processing. Although the functional organization of specific retina cells has been well studied, the molecular profile of many cell types remains unclear in humans. To comprehensively profile the human retina, we performed single-cell RNA sequencing on 20,009 cells from three donors and compiled a reference transcriptome atlas. Using unsupervised clustering analysis, we identified 18 transcriptionally distinct cell populations representing all known neural retinal cells: rod photoreceptors, cone photoreceptors, Müller glia, bipolar cells, amacrine cells, retinal ganglion cells, horizontal cells, astrocytes, and microglia. Our data captured molecular profiles for healthy and putative early degenerating rod photoreceptors, and revealed the loss of MALAT1 expression with longer post-mortem time, which potentially suggested a novel role of MALAT1 in rod photoreceptor degeneration. We have demonstrated the use of this retina transcriptome atlas to benchmark pluripotent stem cell-derived cone photoreceptors and an adult Müller glia cell line. This work provides an important reference with unprecedented insights into the transcriptional landscape of human retinal cells, which is fundamental to understanding retinal biology and disease.


Assuntos
Degeneração Neural/genética , RNA Longo não Codificante/genética , Retina/química , Análise de Célula Única/métodos , Transcriptoma , Autopsia , Análise por Conglomerados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Humanos , Especificidade de Órgãos , Células Fotorreceptoras Retinianas Bastonetes/química , Análise de Sequência de RNA , Aprendizado de Máquina não Supervisionado
7.
Int J Med Inform ; 129: 29-36, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31445269

RESUMO

BACKGROUND AND OBJECTIVE: Autism spectrum disorder (ASD) is a heterogeneous disorder. Research has explored potential ASD subgroups with preliminary evidence supporting the existence of behaviorally and genetically distinct subgroups; however, research has yet to leverage machine learning to identify phenotypes on a scale large enough to robustly examine treatment response across such subgroups. The purpose of the present study was to apply Gaussian Mixture Models and Hierarchical Clustering to identify behavioral phenotypes of ASD and examine treatment response across the learned phenotypes. MATERIALS AND METHODS: The present study included a sample of children with ASD (N = 2400), the largest of its kind to date. Unsupervised machine learning was applied to model ASD subgroups as well as their taxonomic relationships. Retrospective treatment data were available for a portion of the sample (n = 1034). Treatment response was examined within each subgroup via regression. RESULTS: The application of a Gaussian Mixture Model revealed 16 subgroups. Further examination of the subgroups through Hierarchical Agglomerative Clustering suggested 2 overlying behavioral phenotypes with unique deficit profiles each composed of subgroups that differed in severity of those deficits. Furthermore, differentiated response to treatment was found across subtypes, with a substantially higher amount of variance accounted for due to the homogenization effect of the clustering. DISCUSSION: The high amount of variance explained by the regression models indicates that clustering provides a basis for homogenization, and thus an opportunity to tailor treatment based on cluster memberships. These findings have significant implications on prognosis and targeted treatment of ASD, and pave the way for personalized intervention based on unsupervised machine learning.


Assuntos
Transtorno do Espectro Autista/diagnóstico , Aprendizado de Máquina não Supervisionado , Criança , Pré-Escolar , Análise por Conglomerados , Feminino , Humanos , Masculino , Fenótipo , Prognóstico , Estudos Retrospectivos
8.
IEEE Int Conf Rehabil Robot ; 2019: 1167-1172, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31374787

RESUMO

Parkinson disease (PD) is a common neurodegenerative disorders characterized by motor and non-motor impairments. Since the quality of life of PD patients becomes poor while pathology develops, it is imperative to improve the identification of personalized rehabilitation and treatments approaches based on the level of the neurodegeneration process. Objective and precise assessment of the severity of the pathology is crucial to identify the most appropriate treatments. In this context, this paper proposes a wearable system able to measure the motor performance of PD subjects. Two inertial devices were used to capture the motion of the lower and upper limbs respectively, while performing six motor tasks. Forty-one kinematic features were extracted from the inertial signals to describe the performance of each subjects. Three unsupervised learning algorithms (k-Means, Self-organizing maps (SOM) and hierarchical clustering) were applied with a blind approach to group the motor performance. The results show that SOM was the best classifier since it reached accuracy equal to 0.950 to group the instances in two classes (mild vs advanced), and 0.817 considering three classes (mild vs moderate vs severe). Therefore, this system enabled objective assessment of the PD severity through motion analysis, allowing the evaluation of residual motor capabilities and fostering personalized paths for PD rehabilitation and assistance.


Assuntos
Doença de Parkinson/fisiopatologia , Reabilitação , Algoritmos , Feminino , Humanos , Masculino , Doença de Parkinson/terapia , Qualidade de Vida , Aprendizado de Máquina não Supervisionado , Extremidade Superior/fisiopatologia
9.
Sensors (Basel) ; 19(14)2019 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-31330986

RESUMO

In recent years, sensors in the Internet of things have been commonly used in Human's life. APT (Advanced Persistent Threats) has caused serious damage to network security and the sensors play an important role in the attack process. For a long time, attackers infiltrate, attack, conceal, spread, and steal information of target groups through the compound use of various attacking means, while existing security measures based on single-time nodes cannot defend against such attacks. Attackers often exploit the sensors' vulnerabilities to attack targets because the security level of the sensors is relatively low when compared with that of the host. We can find APT attacks by checking the suspicious domains generated at different APT attack stages, since every APT attack has to use DNS to communicate. Although this method works, two challenges still exist: (1) the detection method needs to check a large scale of log data; (2) the small number of attacking samples limits conventional supervised learning. This paper proposes an APT detection framework AULD (Advanced Persistent Threats Unsupervised Learning Detection) to detect suspicious domains in APT attacks by using unsupervised learning. We extract ten important features from the host, domain name, and time from a large number of DNS log data. Later, we get the suspicious cluster by performing unsupervised learning. We put all of the domains in the cluster into the list of malicious domains. We collected 1,584,225,274 DNS records from our university network. The experiments show that AULD detected all of the attacking samples and that AULD can effectively detect the suspicious domain names in APT attacks.


Assuntos
Segurança Computacional , Software , Aprendizado de Máquina não Supervisionado , Algoritmos , Humanos
10.
BMC Infect Dis ; 19(1): 649, 2019 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-31331271

RESUMO

BACKGROUND: Despite the greater sensitivity of the new dengue clinical classification proposed by the World Health Organization (WHO) in 2009, there is a need for a better definition of warning signs and clinical progression of dengue cases. Classic statistical methods have been used to evaluate risk criteria in dengue patients, however they usually cannot access the complexity of dengue clinical profiles. We propose the use of machine learning as an alternative tool to identify the possible characteristics that could be used to develop a risk criterion for severity in dengue patients. METHOD: In this study, we analyzed the clinical profiles of 523 confirmed dengue cases using self-organizing maps (SOM) and random forest algorithms to identify clusters of patients with similar patterns. RESULTS: We identified four natural clusters, two with features of dengue without warning signs or mild disease, one that comprises the severe dengue cases and high frequency of warning signs, and another with intermediate characteristics. Age appeared as the key variable for splitting the data into these four clusters although warning signs such as abdominal pain or tenderness, clinical fluid accumulation, mucosal bleeding, lethargy, restlessness, liver enlargement and increased hematocrit associated with a decrease in platelet counts should also be considered to evaluate severity in dengue patients. CONCLUSIONS: These findings suggest that age must be the first characteristic to be considered in places where dengue is hyperendemic. Our results show that warning signs should be closely monitored, mainly in children. Further studies exploring these results in a longitudinal approach may help to understand the full spectrum of dengue clinical manifestations.


Assuntos
Dengue/etiologia , Aprendizado de Máquina não Supervisionado , Dor Abdominal/etiologia , Adolescente , Adulto , Distribuição por Idade , Idoso , Algoritmos , Brasil , Criança , Pré-Escolar , Estudos Transversais , Dengue/diagnóstico , Feminino , Humanos , Lactente , Masculino , Pessoa de Meia-Idade , Contagem de Plaquetas , Estudos Retrospectivos , Processos Estocásticos
12.
Ann Otol Rhinol Laryngol ; 128(12): 1170-1176, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31319675

RESUMO

OBJECTIVES: This article reviews the principles of unsupervised learning, a novel technique which has increasingly been reported as a tool for the investigation of chronic rhinosinusitis (CRS). It represents a paradigm shift from the traditional approach to investigating CRS based upon the clinically recognized phenotypes of "with polyps" and "without polyps" and instead relies upon the application of complex mathematical models to derive subgroups which can then be further examined. This review article reports on the principles which underlie this investigative technique and some of the published examples in CRS. METHODS: This review summarizes the different types of unsupervised learning techniques which have been described and briefly expounds upon their useful applications. A literature review of studies which have unsupervised learning is then presented to provide a practical guide to its uses and some of the new directions of investigations suggested by their findings. RESULTS: The commonest unsupervised learning technique applied to rhinology research is cluster analysis, which can be further subdivided into hierarchical and non-hierarchical approaches. The mathematical principles which underpin these approaches are explained within this article. Studies which have used these techniques can be broadly divided into those which have used clinical data only and that which includes biomarkers. Studies which include biomarkers adhere closely to the established canon of CRS disease phenotypes, while those that use clinical data may diverge from the typical "polyp versus non-polyp" phenotypes and reflect subgroups of patients who share common symptom modifiers. SUMMARY: Artificial intelligence is increasingly influential in health care research and machine learning techniques have been reported in the investigation of CRS, promising several interesting new avenues for research. However, when critically appraising studies which use this technique, the reader needs to be au fait with the limitations and appropriate uses of its application.


Assuntos
Aprendizado Profundo , Rinite , Sinusite , Aprendizado de Máquina não Supervisionado , Doença Crônica , Humanos
13.
BMC Genomics ; 20(1): 580, 2019 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-31299888

RESUMO

BACKGROUND: Our understanding of polyploid genomes is limited by our inability to definitively assign sequences to a specific subgenome without extensive prior knowledge like high resolution genetic maps or genome sequences of diploid progenitors. In theory, existing methods for assigning sequences to individual species from metagenome samples could be used to separate subgenomes in polyploid organisms, however, these methods rely on differences in coarse genome properties like GC content or sequences from related species. Thus, these approaches do not work for subgenomes where gross features are indistinguishable and related genomes are lacking. Here we describe a method that uses rapidly evolving repetitive DNA to circumvent these limitations. RESULTS: By using short, repetitive, DNA sequences as species-specific signals we separated closely related genomes from test datasets and subgenomes from two polyploid plants, tobacco and wheat, without any prior knowledge. CONCLUSION: This approach is ideal for separating the subgenomes of polyploid species with unsequenced or unknown progenitor genomes.


Assuntos
DNA de Plantas/genética , Evolução Molecular , Genômica/métodos , Poliploidia , Sequências Repetitivas de Ácido Nucleico/genética , Aprendizado de Máquina não Supervisionado , Genoma de Planta/genética , Filogenia , Tabaco/genética , Triticum/genética
14.
Nat Commun ; 10(1): 3045, 2019 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-31292438

RESUMO

In order to advance precision medicine, detailed clinical features ought to be described in a way that leverages current knowledge. Although data collected from biomedical research is expanding at an almost exponential rate, our ability to transform that information into patient care has not kept at pace. A major barrier preventing this transformation is that multi-dimensional data collection and analysis is usually carried out without much understanding of the underlying knowledge structure. Here, in an effort to bridge this gap, Electronic Health Records (EHRs) of individual patients are connected to a heterogeneous knowledge network called Scalable Precision Medicine Oriented Knowledge Engine (SPOKE). Then an unsupervised machine-learning algorithm creates Propagated SPOKE Entry Vectors (PSEVs) that encode the importance of each SPOKE node for any code in the EHRs. We argue that these results, alongside the natural integration of PSEVs into any EHR machine-learning platform, provide a key step toward precision medicine.


Assuntos
Análise de Dados , Coleta de Dados/métodos , Aprendizado de Máquina não Supervisionado , Pesquisa Biomédica/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Medicina de Precisão/métodos
15.
Nature ; 571(7763): 95-98, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31270483

RESUMO

The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3-10, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11-13 (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.


Assuntos
Mineração de Dados/métodos , Conhecimento , Ciência dos Materiais , Processamento de Linguagem Natural , Relatório de Pesquisa , Pesquisa , Terminologia como Assunto , Aprendizado de Máquina não Supervisionado , Condutividade Elétrica , Eletrodos , Ferro , Lítio , Magnetismo , Reprodutibilidade dos Testes , Semântica , Temperatura Ambiente
16.
BMC Bioinformatics ; 20(1): 379, 2019 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-31286861

RESUMO

BACKGROUND: Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denoising of single cell data, imputation of missing values and dimensionality reduction. RESULTS: Here, we present a striking feature with the potential to greatly increase the usability of autoencoders: With specialized training, the autoencoder is not only able to generalize over the data, but also to tease apart biologically meaningful modules, which we found encoded in the representation layer of the network. Our model can, from scRNA-seq data, delineate biological meaningful modules that govern a dataset, as well as give information as to which modules are active in each single cell. Importantly, most of these modules can be explained by known biological functions, as provided by the Hallmark gene sets. CONCLUSIONS: We discover that tailored training of an autoencoder makes it possible to deconvolute biological modules inherent in the data, without any assumptions. By comparisons with gene signatures of canonical pathways we see that the modules are directly interpretable. The scope of this discovery has important implications, as it makes it possible to outline the drivers behind a given effect of a cell. In comparison with other dimensionality reduction methods, or supervised models for classification, our approach has the benefit of both handling well the zero-inflated nature of scRNA-seq, and validating that the model captures relevant information, by establishing a link between input and decoded data. In perspective, our model in combination with clustering methods is able to provide information about which subtype a given single cell belongs to, as well as which biological functions determine that membership.


Assuntos
Perfilação da Expressão Gênica/métodos , RNA Mensageiro/química , Análise de Sequência de RNA/métodos , Aprendizado de Máquina não Supervisionado , Análise por Conglomerados , RNA Mensageiro/metabolismo , Análise de Célula Única
17.
Neural Netw ; 117: 163-178, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31170576

RESUMO

With the rapid development of multimedia technology, massive unlabelled data with high dimensionality need to be processed. As a means of dimensionality reduction, unsupervised feature selection has been widely recognized as an important and challenging pre-step for many machine learning and data mining tasks. Traditional unsupervised feature selection algorithms usually assume that the data instances are identically distributed and there is no dependency between them. However, the data instances are not only associated with high dimensional features but also inherently interconnected with each other. Furthermore, the inevitable noises mixed in data could degenerate the performances of previous methods which perform feature selection in original data space. Without label information, the connection information between data instances can be exploited and could help select relevant features. In this work, we propose a robust unsupervised feature selection method which embeds the latent representation learning into feature selection. Instead of measuring the feature importances in original data space, the feature selection is carried out in the learned latent representation space which is more robust to noises. The latent representation is modelled by non-negative matrix factorization of the affinity matrix which explicitly reflects the relationships of data instances. Meanwhile, the local manifold structure of original data space is preserved by a graph based manifold regularization term in the transformed feature space. An efficient alternating algorithm is developed to optimize the proposed model. Experimental results on eight benchmark datasets demonstrate the effectiveness of the proposed method.


Assuntos
Aprendizado de Máquina não Supervisionado , Mineração de Dados/métodos
18.
BMC Bioinformatics ; 20(1): 326, 2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31195977

RESUMO

BACKGROUND: An important task of macromolecular structure determination by cryo-electron microscopy (cryo-EM) is the identification of single particles in micrographs (particle picking). Due to the necessity of human involvement in the process, current particle picking techniques are time consuming and often result in many false positives and negatives. Adjusting the parameters to eliminate false positives often excludes true particles in certain orientations. The supervised machine learning (e.g. deep learning) methods for particle picking often need a large training dataset, which requires extensive manual annotation. Other reference-dependent methods rely on low-resolution templates for particle detection, matching and picking, and therefore, are not fully automated. These issues motivate us to develop a fully automated, unbiased framework for particle picking. RESULTS: We design a fully automated, unsupervised approach for single particle picking in cryo-EM micrographs. Our approach consists of three stages: image preprocessing, particle clustering, and particle picking. The image preprocessing is based on multiple techniques including: image averaging, normalization, cryo-EM image contrast enhancement correction (CEC), histogram equalization, restoration, adaptive histogram equalization, guided image filtering, and morphological operations. Image preprocessing significantly improves the quality of original cryo-EM images. Our particle clustering method is based on an intensity distribution model which is much faster and more accurate than traditional K-means and Fuzzy C-Means (FCM) algorithms for single particle clustering. Our particle picking method, based on image cleaning and shape detection with a modified Circular Hough Transform algorithm, effectively detects the shape and the center of each particle and creates a bounding box encapsulating the particles. CONCLUSIONS: AutoCryoPicker can automatically and effectively recognize particle-like objects from noisy cryo-EM micrographs without the need of labeled training data or human intervention making it a useful tool for cryo-EM protein structure determination.


Assuntos
Algoritmos , Microscopia Crioeletrônica/métodos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina não Supervisionado , Automação , Análise por Conglomerados , Software
19.
Spectrochim Acta A Mol Biomol Spectrosc ; 222: 117243, 2019 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-31226616

RESUMO

Root-knot nematode is a common plant-parasitic pest with a highly destructive that infects more than 2000 plant species. Panax notoginseng (P. notoginseng) is one of the most susceptible traditional medicine. More importantly, it is difficult to distinguish the powders of P. notoginseng infected with root-knot nematode from those of healthy P. notoginseng due to the color and shape are same after being ground into powder. In this paper, Attenuated Total Reflection-Fourier Transform Infrared (ATR-FTIR) was used to identify P. notoginseng samples. Multiplicative scatter correction (MSC) was applied to preprocess the spectral data. Competitive adaptive reweighted sampling (CARS) and successive projection algorithm (SPA) were employed to select feature variables. Density-based spatial clustering of application with noise (DBSCAN) was adopted to discover groups within the data. Also, we found that the geographical origin is a pivotal factor to consider when identifying unhealthy P. notoginseng. Therefore, we introduced a novel multi-label classification (MLC) method to identify healthy and unhealthy P. notoginseng powders from three different geographical origins. In addition, binary relevance method (BR), classifier chain (CC), ensembles of classifier chains (ECC), and multilayer perceptron classifier (MLPC) were applied to create classification models, ECC exhibits superior performance in particular.


Assuntos
Panax notoginseng/parasitologia , Doenças das Plantas/parasitologia , Raízes de Plantas/parasitologia , Espectroscopia de Infravermelho com Transformada de Fourier/métodos , Algoritmos , Animais , Análise por Conglomerados , Medicamentos de Ervas Chinesas/química , Nematoides/química , Nematoides/isolamento & purificação , Panax notoginseng/química , Raízes de Plantas/química , Pós , Aprendizado de Máquina Supervisionado , Aprendizado de Máquina não Supervisionado
20.
Nature ; 569(7755): 208-214, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-31068721

RESUMO

Software implementations of brain-inspired computing underlie many important computational tasks, from image processing to speech recognition, artificial intelligence and deep learning applications. Yet, unlike real neural tissue, traditional computing architectures physically separate the core computing functions of memory and processing, making fast, efficient and low-energy computing difficult to achieve. To overcome such limitations, an attractive alternative is to design hardware that mimics neurons and synapses. Such hardware, when connected in networks or neuromorphic systems, processes information in a way more analogous to brains. Here we present an all-optical version of such a neurosynaptic system, capable of supervised and unsupervised learning. We exploit wavelength division multiplexing techniques to implement a scalable circuit architecture for photonic neural networks, successfully demonstrating pattern recognition directly in the optical domain. Such photonic neurosynaptic networks promise access to the high speed and high bandwidth inherent to optical systems, thus enabling the direct processing of optical telecommunication and visual data.


Assuntos
Biomimética/métodos , Modelos Neurológicos , Reconhecimento Automatizado de Padrão/métodos , Fótons , Aprendizado de Máquina Supervisionado , Aprendizado de Máquina não Supervisionado , Potenciais de Ação , Sistemas de Computação , Computadores , Rede Nervosa/citologia , Rede Nervosa/fisiologia , Neurônios/citologia , Neurônios/fisiologia , Sinapses/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA