Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros

Banco de datos
Tipo del documento
Asunto de la revista
País de afiliación
Intervalo de año de publicación
1.
Data Brief ; 55: 110712, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39081491

RESUMEN

The utilization of computer vision techniques has significantly enhanced the automation processes across various industries, including textile manufacturing, agriculture, and information technology. Specifically, in the domain of textile manufacturing, these techniques have revolutionized the detection of fiber defects and the quantification of cotton content in fabrics. Traditionally, the assessment of cotton percentages was a labor-intensive and time-consuming process that relied heavily on manual testing methods. However, the adoption of computer vision approaches requires a comprehensive dataset of fabric samples, each with a known cotton percentage, to serve as training data for machine learning models. This paper introduces a novel dataset comprising 1300 original images, covering a wide range of cotton percentages across thirteen distinct categories, from 30% to 99%. By employing image augmentation techniques, such as- rotation, horizontal flip, vertical flip, width shift, height shift, shear range, and zooming, this dataset has been expanded to include a total of 27,300 images, thereby enhancing its utility for training and validating computer vision models aimed at accurately determining cotton content in fabrics. Through the extraction of pertinent features from the images of fabrics, this dataset holds the potential to significantly improve the accuracy and efficiency of computer vision-based cotton percentage detection.

2.
Data Brief ; 52: 110016, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38293578

RESUMEN

Compared to other popular research domains, dermatology got less attention among machine learning researchers. One of the main concerns for this problem is an inadequate dataset since collecting samples from the human body is very sensitive. In recent years, arsenic has emerged as a significant issue for dermatologists. Arsenic is a highly toxic substance found in the earth's crust whose small amounts can be very injurious to the human body. People who are exposed to arsenic for a long time through water and food can get cancer and skin lesions. With a view to contributing to this aspect, this dataset has been organized with the help of which the researchers can understand the impact of this contamination and design a solution using artificial intelligence. To the best of our knowledge, this is the first standard, easy-to-use, and open dataset of arsenic diseases. The images were collected from four places in Bangladesh, under the Department of Public Health Engineering, Chapainawabganj, where they are working on arsenic contamination. The dataset has 8892 skin images, with half of them showing people with arsenic effects and the other half showing mixed skin images that are not affected by arsenic. This makes the dataset useful for treating people with arsenic-related conditions. Eventually, this dataset can attract the attention of not only the machine learning researchers, but also scientists, doctors, and other professionals in the associated research field.

3.
Data Brief ; 47: 108941, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36819904

RESUMEN

Agriculture is one of the few remaining sectors that is yet to receive proper attention from the machine learning community. The importance of datasets in the machine learning discipline cannot be overemphasized. The lack of standard and publicly available datasets related to agriculture impedes practitioners of this discipline to harness the full benefit of these powerful computational predictive tools and techniques. To improve this scenario, we develop, to the best of our knowledge, the first-ever standard, ready-to-use, and publicly available dataset of mango leaves. The images are collected from four mango orchards of Bangladesh, one of the top mango-growing countries of the world. The dataset contains 4000 images of about 1800 distinct leaves covering seven diseases. Although the dataset is developed using mango leaves of Bangladesh only, since we deal with diseases that are common across many countries, this dataset is likely to be applicable to identify mango diseases in other countries as well, thereby boosting mango yield. This dataset is expected to draw wide attention from machine learning researchers and practitioners in the field of automated agriculture.

4.
BMC Bioinformatics ; 9: 414, 2008 Oct 04.
Artículo en Inglés | MEDLINE | ID: mdl-18834544

RESUMEN

BACKGROUND: Eukaryotic promoter prediction using computational analysis techniques is one of the most difficult jobs in computational genomics that is essential for constructing and understanding genetic regulatory networks. The increased availability of sequence data for various eukaryotic organisms in recent years has necessitated for better tools and techniques for the prediction and analysis of promoters in eukaryotic sequences. Many promoter prediction methods and tools have been developed to date but they have yet to provide acceptable predictive performance. One obvious criteria to improve on current methods is to devise a better system for selecting appropriate features of promoters that distinguish them from non-promoters. Secondly improved performance can be achieved by enhancing the predictive ability of the machine learning algorithms used. RESULTS: In this paper, a novel approach is presented in which 128 4-mer motifs in conjunction with a non-linear machine-learning algorithm utilising a Support Vector Machine (SVM) are used to distinguish between promoter and non-promoter DNA sequences. By applying this approach to plant, Drosophila, human, mouse and rat sequences, the classification model has showed 7-fold cross-validation percentage accuracies of 83.81%, 94.82%, 91.25%, 90.77% and 82.35% respectively. The high sensitivity and specificity value of 0.86 and 0.90 for plant; 0.96 and 0.92 for Drosophila; 0.88 and 0.92 for human; 0.78 and 0.84 for mouse and 0.82 and 0.80 for rat demonstrate that this technique is less prone to false positive results and exhibits better performance than many other tools. Moreover, this model successfully identifies location of promoter using TATA weight matrix. CONCLUSION: The high sensitivity and specificity indicate that 4-mer frequencies in conjunction with supervised machine-learning methods can be beneficial in the identification of RNA pol II promoters comparative to other methods. This approach can be extended to identify promoters in sequences for other eukaryotic genomes.


Asunto(s)
Inteligencia Artificial , Conformación de Ácido Nucleico , Regiones Promotoras Genéticas , ARN Polimerasa II/genética , Análisis de Secuencia de ADN/métodos , Animales , Bases de Datos de Ácidos Nucleicos , Proteínas de Drosophila/genética , Células Eucariotas , Genómica/métodos , Humanos , Ratones , Ratas , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Relación Estructura-Actividad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA