Búsqueda | Portal Regional de la BVS

DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding.

Ajayi, Kehinde; Wei, Xin; Gryder, Martin; Shields, Winston; Wu, Jian; Jones, Shawn M; Kucer, Michal; Oyen, Diane.

Sci Data ; 10(1): 772, 2023 11 07.

Artículo en Inglés | MEDLINE | ID: mdl-37935698

RESUMEN

Recent advances in computer vision (CV) and natural language processing have been driven by exploiting big data on practical applications. However, these research fields are still limited by the sheer volume, versatility, and diversity of the available datasets. CV tasks, such as image captioning, which has primarily been carried out on natural images, still struggle to produce accurate and meaningful captions on sketched images often included in scientific and technical documents. The advancement of other tasks such as 3D reconstruction from 2D images requires larger datasets with multiple viewpoints. We introduce DeepPatent2, a large-scale dataset, providing more than 2.7 million technical drawings with 132,890 object names and 22,394 viewpoints extracted from 14 years of US design patent documents. We demonstrate the usefulness of DeepPatent2 with conceptual captioning. We further provide the potential usefulness of our dataset to facilitate other research areas such as 3D image reconstruction and image retrieval.

Leveraging expert feature knowledge for predicting image aesthetics.

Kucer, Michal; Loui, Alexander C; Messinger, David W.

IEEE Trans Image Process ; 2018 Jun 07.

Artículo en Inglés | MEDLINE | ID: mdl-29994210

RESUMEN

The ability to rank images based on their appearance finds many real-world applications such as image retrieval or image album creation. Despite the recent dominance of deep learning methods in computer vision which often result in superior performance, they are not always the methods of choice because they lack interpretability. In this work, we investigate the possibility of improving image aesthetic inference of convolutional neural networks with hand-designed features that rely on domain expertise in various fields. We perform a comparison of hand-crafted feature sets in their ability to predict fine-grained aesthetics scores on two image aesthetics datasets. We observe that even feature sets published earlier are able to compete with more recently published algorithms and, by combining the algorithms together, one can obtain a significant improvement in predicting image aesthetics. By using a tree-based learner, we perform feature elimination to understand the best performing features overall and across different image categories. Only roughly 15 % and 8 % of the features are needed to achieve full performance in predicting a fine-grained aesthetic score and binary classification respectively. By combining hand-crafted features with meta-features that predict the quality of an image based on CNN features, the model performs better than a baseline VGG16 model. One can, however, achieve more significant improvement in both aesthetics score prediction and binary classification by fusing the hand-crafted features and the penultimate layer activations. Our experiments indicate an improvement up to 2.2 % achieving current state-of-the-art binary classification accuracy on the AVA dataset when the hand-designed features are fused with activation from VGG16 and ResNet50 networks.

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA