Work

Deep Perceptual Losses and Self-supervised Fine-tuning for Image and Video Super-resolution

Public

Super-resolution (SR) has become one of the most critical problems in image and video processing. In Chapter 2 of this thesis, a detailed review of existing Deep Learning (DL) techniques for addressing the SR task, with an emphasis on how DL and analytical techniques can be combined, is provided. Chapter 3 addresses the fact that no perceptual loss functions, such as adversarial or feature losses, has been applied to the task of video super-resolution. To this end, a new generator network is introduced for the VSR problem, named VSRResNet, along with a new discriminator architecture to properly guide VSRResNet during the adversarial training. The VSR formulation is enhanced with two regularizers, a distance loss in feature-space and pixel-space, to obtain the final VSRResFeatGAN model. We show that the resulting model produces SR frames that have a significantly higher perceptual quality than those produced by the VSR models trained with non-perceptual objective functions. However, while Convolutional Neural Networks (CNNs) trained for image and video SR regularly achieve new state-of-the-art performance, such as the VSRResFeatGAN model, they also suffer from significant drawbacks. One of their limitations is their lack of robustness at test time to unseen image formation models during training. Other limitations include the generation of artifacts and hallucinated content when training Generative Adversarial Networks (GANs) for image and video SR. While the Deep Learning literature focuses on presenting new training schemes and architectural settings to resolve these various issues, Chapter 4 describes a novel method that avoids training and corrects for SR results with a fully self-supervised fine-tuning approach. More specifically, at test time, given an image and its known image formation model, the parameters of a trained networks are fine-tuned and iteratively updated using a data fidelity loss. We apply our proposed fine-tuning algorithm on multiple state-of-the-art image and video SR CNNs and show that it can successfully correct for a sub-optimal SR solution by entirely relying on internal learning at test time. We apply our method to the problem of fine-tuning for unseen image formation models and for the removal of artifacts introduced by GANs.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items