reveal.js - The HTML Presentation Framework

Probabilistic Active Learning in Datastreams

Daniel Kottke, Georg Krempl, Myra Spiliopoulou

Knowledge Management and Discovery Lab
Otto-von-Guericke-University Magdeburg, Germany

daniel.kottke@ovgu.de

www.daniel.kottke.eu/talks/2015_IDA

Pool vs. Stream Learning

Spatial components is extended with temporal information

Classification model might change (Drift)

Fast, endless instance generation

Efficient (on-line) algorithms required

Applications:

Data from twitter, sensors, bank transactions

Acquiring the next label?

Budget

How many labels to buy?

Spatial component

Where to buy (feature vectors)?

Temporal component

When to buy? (time)

Our method

Budgets in streams

Pools: absolute number (e.g. stop after 40 labels)
Streams: relative definition necessary (e.g. buy 10%)
Best budget fit with sequential acquisitions
Better: spend budget in expectation and control variance through tolerance window
Here: budget is constant over time for constant labeling capacities, permanently updated models, fair evaluation

Divide and Conquer

Measure spatial usefulness from Probabilistic Active Learning

Divide and Conquer

Choose the most useful instances based on this value

Spatial Selection (Recap):
Probabilistic Active Learning

Evaluates each labeling candidate in its neighborhood using its label statistics:

Posterior probability (\(\hat{p}\))
Number of labels (\(n\))

Image: "Probabilistic Active Learning: Towards Combining Versatility, Optimality and Efficiency"
by G. Krempl, D. Kottke, M. Spiliopoulou, Proc. of the 17th Int. Conf. on Discovery Science, Bled, 2014. Springer.

Spatial Selection (Recap):
Probabilistic Active Learning

Probabilistic description of expected performance gain using the expectation over:

the possible label outcomes (\(y\)) of the candidate
the true posterior probability (\(p\))

\[ \mathrm{pgain}((n, \hat{p})) = \mathtt{E}_{p}\bigg[ \mathtt{E}_{y} \left[ \mathrm{gain}_{p}((n, \hat{p}),y) \right] \bigg] \\ = \int_0^1 { \mathrm{Beta}_{n \hat{p} + 1,n (1-\hat{p}) + 1}(p) \cdot \sum_{y \in \{0,1\}} \mathrm{Ber}_{p}(y) \cdot \mathrm{gain}_{p}((n, \hat{p}),y) } \, \mathrm{d} p \]

Temporal selection:
Problem Specification

Input: Single-value stream of spatial usefulness values

Arbitrary range
Arbitrary distribution
Changing distribution (dependent on previous decisions)

Task: Select the best b % of this value stream instantly
Subtask: Difference of current and target budget should not exceed a given tolerance window

Temporal selection:
Main Idea

Store the most recent usefulness values in a list
When a new value appears:

Determine its relative rank in that list
If it belongs to the better b % of that list,
acquire the corresponding label

Temporal selection:
IQF - Properties

Selects the best b % of a single value stream in expectation
Only works for value streams coming from one single distribution
But: Spatial usefulness probably decrease as classification performance increases over time \(\rightarrow\) less labels were acquired
Idea: Threshold correction that ensures that the target budget is met w.r.t. a tolerance window

Temporal selection:
Balancing

Tolerance window (\(w_\textrm{tol}\)):
maximal difference between current and the target budget
If there are label acquistions left (\(acq_\textrm{left}\) > 0)
\(\rightarrow\) decrease threshold \(\theta\) (and vice versa)

\[ \theta_\textrm{bal} = \theta - f \cdot acq_\textrm{left} \]
How to define \(f\)?

Temporal selection:
Balancing

Defining the normalization factor \(f\):

Scales with the range of the data (\(\Delta\))
The higher the tolerance window (\(w_\textrm{tol}\)), the smaller the correction

\[ \theta_\textrm{bal} = \theta - \Delta \cdot \frac{acq_\textrm{left}}{w_\textrm{tol}} \]

\(\theta_\textrm{bal}\) - Balanced threshold
\(\theta\) - IQF acquisition threshold
\(\Delta\) - Data range of IQF window
\(w_\textrm{tol}\) - Tolerance window size

Evaluation

Experimental Settings

Datasets: 5 synthetic, 2 real
Algorithms:

PAL + BIQF
Random, Variable Uncertainty¹ (VarUncer), Split¹
Split + BIQF, VarUncer + BIQF

100 random test and train streams
Different budgets \(b \in \{0.02, 0,05, 0.1, 0.2, 0.3, 0.5, 0.7\}\)
BIQF parameter: \(w = 100, w_{\mathrm{tol}} = 50\)

¹ I. Zliobaite, A. Bifet, B. Pfahringer, G. Holmes: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 2014.

Evaluation of Active Learning

Conclusion

Active Learning algorithm for Datastreams that

is able to reach a defined budget within a given tolerance window
separates the spatial and temporal usefulness
uses Probabilistic Active Learning for spatial usefulness
uses the new Balanced Incremental Quantile Filter (BIQF) to determine the temporal usefulness of instances
is superior to state-of-the-art methods (esp. at the beginning)

Thank you for your attention!

Slides, Paper, Bibtex:
www.daniel.kottke.eu/talks/2015_IDA

Supplemental material:
kmd.cs.ovgu.de/res/pal/

Probabilistic Active Learning in Datastreams
Daniel Kottke, Georg Krempl, Myra Spiliopoulou
Fourteenth International Symposium on Intelligent Data Analysis (IDA)
Saint-Etienne, France, 2015. Springer.

Probabilistic Active Learning in Datastreams

Daniel Kottke, Georg Krempl, Myra Spiliopoulou

Pool vs. Stream Learning

Acquiring the next label?

Budget

Spatial component

Temporal component

Our method

Budgets in streams

Divide and Conquer

Divide and Conquer

Divide and Conquer

Divide and Conquer

Divide and Conquer

Spatial Selection (Recap):Probabilistic Active Learning

Spatial Selection (Recap):Probabilistic Active Learning

Temporal selection: Problem Specification

Temporal selection: Main Idea

Temporal selection: IQF - Properties

Temporal selection: Balancing

Temporal selection: Balancing

Evaluation

Experimental Settings

Evaluation of Active Learning

Evaluation of Active Learning

Evaluation of Active Learning

Conclusion

Thank you for your attention!

Spatial Selection (Recap):
Probabilistic Active Learning

Spatial Selection (Recap):
Probabilistic Active Learning

Temporal selection:
Problem Specification

Temporal selection:
Main Idea

Temporal selection:
IQF - Properties

Temporal selection:
Balancing

Temporal selection:
Balancing