Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Anomaly Detection and Inter-Sensor Transfer Learning on Smart Manufacturing Datasets.

Abdallah, Mustafa; Joung, Byung-Gun; Lee, Wo Jae; Mousoulis, Charilaos; Raghunathan, Nithin; Shakouri, Ali; Sutherland, John W; Bagchi, Saurabh.

Sensors (Basel) ; 23(1)2023 Jan 02.

Artigo em Inglês | MEDLINE | ID: mdl-36617091

RESUMO

Smart manufacturing systems are considered the next generation of manufacturing applications. One important goal of the smart manufacturing system is to rapidly detect and anticipate failures to reduce maintenance cost and minimize machine downtime. This often boils down to detecting anomalies within the sensor data acquired from the system which has different characteristics with respect to the operating point of the environment or machines, such as, the RPM of the motor. In this paper, we analyze four datasets from sensors deployed in manufacturing testbeds. We detect the level of defect for each sensor data leveraging deep learning techniques. We also evaluate the performance of several traditional and ML-based forecasting models for predicting the time series of sensor data. We show that careful selection of training data by aggregating multiple predictive RPM values is beneficial. Then, considering the sparse data from one kind of sensor, we perform transfer learning from a high data rate sensor to perform defect type classification. We release our manufacturing database corpus (4 datasets) and codes for anomaly detection and defect type classification for the community to build on it. Taken together, we show that predictive failure classification can be achieved, paving the way for predictive maintenance.

Assuntos

Comércio , Aprendizado de Máquina , Bases de Dados Factuais , Fatores de Tempo

Temperature Self-Calibration of Always-On, Field-Deployed Ion-Selective Electrodes Based on Differential Voltage Measurement.

Saha, Ajanta; Yermembetova, Aiganym; Mi, Ye; Gopalakrishnan, Sarath; Sedaghat, Sotoudeh; Waimin, Jose; Wang, Pengcheng; Glassmaker, Nicholas; Mousoulis, Charilaos; Raghunathan, Nithin; Bagchi, Saurabh; Rahimi, Rahim; Shakouri, Ali; Wei, Alexander; Alam, Muhammad A.

ACS Sens ; 7(9): 2661-2670, 2022 09 23.

Artigo em Inglês | MEDLINE | ID: mdl-36074898

RESUMO

Originally developed for use in controlled laboratory settings, potentiometric ion-selective electrode (ISE) sensors have recently been deployed for continuous, in situ measurement of analyte concentration in agricultural (e.g., nitrate), environmental (e.g., ocean acidification), industrial (e.g., wastewater), and health-care sectors (e.g., sweat sensors). However, due to uncontrolled temperature and lack of frequent calibration in these field applications, it has been difficult to achieve accuracy comparable to the laboratory setting. In this paper, we propose a novel temperature self-calibration method where the ISE sensors can serve as their own thermometer and therefore precisely measure the analyte concentration in the field condition by compensating for the temperature variations. We validate the method with controlled experiments using pH and nitrate ISEs, which use the Nernst principle for electrochemical sensing. We show that, using temperature self-calibration, pH and nitrate can be measured within 0.3% and 5% of the true concentration, respectively, under varying concentrations and temperature conditions. Moreover, we perform a field study to continuously monitor the nitrate concentration of an agricultural field over a period of 6 days. Our temperature self-calibration approach determines the nitrate concentration within 4% of the ground truth measured by laboratory-based high-precision nitrate sensors. Our approach is general and would allow battery-free temperature-corrected analyte measurement for all Nernst principle-based sensors being deployed as wearable or implantable sensors.

Assuntos

Eletrodos Seletivos de Íons , Nitratos , Calibragem , Concentração de Íons de Hidrogênio , Nitratos/análise , Água do Mar , Temperatura , Águas Residuárias

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers.

Mahadik, Kanak; Wright, Christopher; Kulkarni, Milind; Bagchi, Saurabh; Chaterji, Somali.

Sci Rep ; 9(1): 14882, 2019 10 16.

Artigo em Inglês | MEDLINE | ID: mdl-31619717

RESUMO

Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable de novo assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular de Bruijn graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of k-values used in the construction of de Bruijn graphs (DBG). However, this process of sequentially iterating from small to large k-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct k-value in parallel. We develop an innovative mechanism to "patch" a higher k-valued graph with contigs generated from a lower k-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes simultaneously. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).

Assuntos

Algoritmos , Mapeamento de Sequências Contíguas/métodos , Genoma , Análise de Sequência de DNA/estatística & dados numéricos , Software , Sequência de Bases , Benchmarking , Conjuntos de Dados como Assunto , Escherichia coli/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Staphylococcus aureus/genética

Federation in genomics pipelines: techniques and challenges.

Chaterji, Somali; Koo, Jinkyu; Li, Ninghui; Meyer, Folker; Grama, Ananth; Bagchi, Saurabh.

Brief Bioinform ; 20(1): 235-244, 2019 01 18.

Artigo em Inglês | MEDLINE | ID: mdl-28968781

RESUMO

Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases.

Assuntos

Genômica/estatística & dados numéricos , Disseminação de Informação/métodos , Big Data , Biologia Computacional/métodos , Confidencialidade , Bases de Dados Genéticas/estatística & dados numéricos , Privacidade Genética , Humanos , Metagenômica/estatística & dados numéricos , Software , Estados Unidos

MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis.

Meyer, Folker; Bagchi, Saurabh; Chaterji, Somali; Gerlach, Wolfgang; Grama, Ananth; Harrison, Travis; Paczian, Tobias; Trimble, William L; Wilke, Andreas.

Brief Bioinform ; 20(4): 1151-1159, 2019 07 19.

Artigo em Inglês | MEDLINE | ID: mdl-29028869

RESUMO

As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , Metagenômica/métodos , Software , Algoritmos , Orçamentos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Internet , Metagenômica/economia , Metagenômica/estatística & dados numéricos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Interface Usuário-Computador , Fluxo de Trabalho

Erratum to: 'MicroRNA target prediction using thermodynamic and sequence curves'.

Ghoshal, Asish; Shankar, Raghavendran; Bagchi, Saurabh; Grama, Ananth; Chaterji, Somali.

BMC Genomics ; 17: 216, 2016 Mar 09.

Artigo em Inglês | MEDLINE | ID: mdl-26960331

The MG-RAST metagenomics database and portal in 2015.

Wilke, Andreas; Bischof, Jared; Gerlach, Wolfgang; Glass, Elizabeth; Harrison, Travis; Keegan, Kevin P; Paczian, Tobias; Trimble, William L; Bagchi, Saurabh; Grama, Ananth; Chaterji, Somali; Meyer, Folker.

Nucleic Acids Res ; 44(D1): D590-4, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26656948

RESUMO

MG-RAST (http://metagenomics.anl.gov) is an open-submission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. The system currently hosts over 200,000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. To show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignment tools.

Assuntos

Bases de Dados de Ácidos Nucleicos , Metagenômica , Internet , Alinhamento de Sequência

MicroRNA target prediction using thermodynamic and sequence curves.

Ghoshal, Asish; Shankar, Raghavendran; Bagchi, Saurabh; Grama, Ananth; Chaterji, Somali.

BMC Genomics ; 16: 999, 2015 Nov 25.

Artigo em Inglês | MEDLINE | ID: mdl-26608597

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are small regulatory RNA that mediate RNA interference by binding to various mRNA target regions. There have been several computational methods for the identification of target mRNAs for miRNAs. However, these have considered all contributory features as scalar representations, primarily, as thermodynamic or sequence-based features. Further, a majority of these methods solely target canonical sites, which are sites with "seed" complementarity. Here, we present a machine-learning classification scheme, titled Avishkar, which captures the spatial profile of miRNA-mRNA interactions via smooth B-spline curves, separately for various input features, such as thermodynamic and sequence features. Further, we use a principled approach to uniformly model canonical and non-canonical seed matches, using a novel seed enrichment metric. RESULTS: We demonstrate that large number of seed-match patterns have high enrichment values, conserved across species, and that majority of miRNA binding sites involve non-canonical matches, corroborating recent findings. Using spatial curves and popular categorical features, such as target site length and location, we train a linear SVM model, utilizing experimental CLIP-seq data. Our model significantly outperforms all established methods, for both canonical and non-canonical sites. We achieve this while using a much larger candidate miRNA-mRNA interaction set than prior work. CONCLUSIONS: We have developed an efficient SVM-based model for miRNA target prediction using recent CLIP-seq data, demonstrating superior performance, evaluated using ROC curves, specifically about 20% better than the state-of-the-art, for different species (human or mouse), or different target types (canonical or non-canonical). To the best of our knowledge we provide the first distributed framework for microRNA target prediction based on Apache Hadoop and Spark. AVAILABILITY: All source code and data is publicly available at https://bitbucket.org/cellsandmachines/avishkar.

Assuntos

Sítios de Ligação , Biologia Computacional/métodos , MicroRNAs/química , MicroRNAs/genética , Interferência de RNA , RNA Mensageiro/química , RNA Mensageiro/genética , Termodinâmica , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Animais , Humanos , Camundongos , Curva ROC , Reprodutibilidade dos Testes , Análise de Sequência de RNA , Máquina de Vetores de Suporte

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA