PAC learning with approximate predictors

Andrew Turner, Ata Kaban*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

12 Downloads (Pure)

Abstract

Approximate learning machines have become popular in the era of small devices, including quantised, factorised, hashed, or otherwise compressed predictors, and the quest to explain and guarantee good generalisation abilities for such methods has just begun. In this paper, we study the role of approximability in learning, both in the full precision and the approximated settings. We do this through a notion of sensitivity of predictors to the action of the approximation operator at hand. We prove upper bounds on the generalisation of such predictors, yielding the following main findings, for any PAC-learnable class and any given approximation operator: 1) We show that under mild conditions, approximable target concepts are learnable from a smaller labelled sample, provided sufficient unlabelled data; 2) We give algorithms that guarantee a good predictor whose approximation also enjoys the same generalisation guarantees; 3) We highlight natural examples of structure in the class of sensitivities, which reduce, and possibly even eliminate the otherwise abundant requirement of additional unlabelled data, and henceforth shed new light onto what makes one problem instance easier to learn than another. These results embed the scope of modern model-compression approaches into the general goal of statistical learning theory, which in return suggests appropriate algorithms through minimising uniform bounds.
Original languageEnglish
Number of pages40
JournalMachine Learning
Early online date8 Feb 2023
DOIs
Publication statusE-pub ahead of print - 8 Feb 2023

Keywords

  • statistical learning
  • generalisation error bounds
  • model-compression
  • approximate learning algorithms

ASJC Scopus subject areas

  • Computer Science(all)
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'PAC learning with approximate predictors'. Together they form a unique fingerprint.

Cite this