LARGE-SCALE ZERO-SHOT LEARNING IN THE WILD: CLASSIFYING ZOOLOGICAL ILLUSTRATIONS

0
452

Highlights

A zero-shot prototypical learning approach was proposed to deal with the limited availability of training data.•

Methods for training the proposed model with a variable number of multimodal auxiliary sources were compared.•

Effects of training the proposed model with hierarchical prototype loss were measured.•

The ZICE dataset, created from zoological illustrations augmented with multimodal auxiliary data, was introduced and used to test the proposed model.•

The performance of the proposed model was analysed qualitatively on real-world data.

Abstract

In this paper we analyse the classification of zoological illustrations. Historically, zoological illustrations were the modus operandi for the documentation of new species, an Currently, they serve as crucial sources for long-term ecological and biodiversity research. By employing computational methods for classification, the illustrations can be made amenable to research. Automated species identification is challenging due to the long-tailed nature of the data, and the millions of possible classes in the species taxonomy. Success commonly depends on large training sets with many examples per class, but images from only a subset of classes are digitally available, and many images are unlabelled, since labelling requires domain expertise. We explore zero-shot learning to address the problem, where features are learned from classes with medium to large samples, which are then transferred to recognise classes with few or no training samples. We specifically explore how distributed, multimodal background knowledge from data providers, such as the Global Biodiversity Information Facility (GBIF), iNaturalist, and the Biodiversity Heritage Library (BHL), can be used to share knowledge between classes for zero-shot learning. We train a prototypical network for zero-shot classification, and introduce fused prototypes (FP) and hierarchical prototype loss (HPL) to optimise the model. Finally, we analyse the performance of the model for use in real-world applications. The experimental results are encouraging, indicating potential for use of such models in an expert support system, but also express the difficulty of our task, showing a necessity for research into computer vision methods that are able to learn from small samples.