by Daniel Kottke, Georg Krempl and Myra Spiliopoulou.
Classification systems use data (consisting of instances and class labels) to learn a model that predicts the unknown class label of unseen instances. In contrast to the very fast and cheap generation of instances due to big data and connected systems in everyones life, the labeling of these instances remains difficult and expensive as experts are necessary for this annotation. Active learning methods aim to optimize the annotation process by choosing those instances for labeling that improve the classifier’s performance the most.
In data streams, where instances arrive sequentially, an instance-based active learning method has to decide immediately whether to acquire the label or not. These streams are usually fast and may change over time. Hence, the classifier must be updated regularly. In contrast to active learning on pools where the usefulness of instances for classification improvement is based on its feature vector (spatial usefulness), we have the additional component of time (temporal usefulness).
We studied the effect of different temporal sampling techniques in active learning and propose solutions that answer the question: “When should labels be acquired?”. Our experiments show that spatial usefulness values also show drifting distributions. Hence, we extend the recently proposed Balanced Incremental Quantile Filter (BIQF) [Probabilistic Active Learning in Datastreams, Kottke et al., 2015] with a trend correction. Our evaluation shows the effectiveness of our method.
Published on the Tagung der Deutschen Arbeitsgemeinschaft Statistik (DAGSTAT), 2016, Göttingen.
Slides: www.daniel.kottke.eu/talks/2016_DAGSTAT/slides
General Information about PAL: http://kmd.cs.ovgu.de/res/pal/