Compressing Cross-Domain Representation via Lifelong Knowledge Distillation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most Knowledge Distillation (KD) approaches focus on the discriminative information transfer and assume that the data is provided in batches during training stages. In this paper, we address a more challenging scenario in which different tasks are presented sequentially, at different times, and the learning goal is to transfer the generative factors of visual concepts learned by a Teacher module to a compact latent space represented by a Student module. In order to achieve this, we develop a new Lifelong Knowledge Distillation (LKD) framework where we train an infinite mixture model as the Teacher which automatically increases its capacity to deal with a growing number of tasks. In order to ensure a compact architecture and to avoid forgetting, we propose to measure the relevance of the knowledge from a new task for a set of experts making up the Teacher module, guiding each expert to capture the probabilistic characteristics of several similar domains. The network architecture is expanded only when learning an entirely different task. The Student is implemented as a lightweight probabilistic generative model. The experiments show that LKD can train a compressed Student module that achieves the state of the art results with fewer parameters.
Original languageEnglish
Title of host publicationIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Number of pages5
DOIs
Publication statusPublished - 4 Jun 2023
Event2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Conference

Conference2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Abbreviated titleICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23

Bibliographical note

© IEEE, 2023. This is an author-produced version of the published paper. Uploaded in accordance with the University’s Research Publications and Open Access policy.

Cite this