Skip to main navigation Skip to search Skip to main content

Learned Hybrid Video Coding for Human Perception and Multiple Machine Vision Tasks

Martin Benjak*, Saifullah Khan*, Yi Hsin Chen, Wen Hsiao Peng, Jörn Ostermann*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Abstract

In this work, we present a learned multi-task video codec that is optimized for human and machine vision. The codec consists of an encoder that maps images from the pixel domain to a latent representation and multiple decoders that map the latent to either an image for human consumption or multiple task-specific features for different machine vision tasks. This allows a single bitstream to be used for multiple tasks while also reducing the decoder complexity for machine vision tasks. Unlike most learned codecs, our method performs inter-coding at the latent level instead of the pixel domain. Experiments show that the proposed method achieves a compression performance for machine vision tasks comparable to other multi-task codecs designed for machine vision only, while also providing video reconstruction.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings
PublisherIEEE Computer Society
Pages1996-2001
Number of pages6
ISBN (Electronic)9798331523794
ISBN (Print)979-8-3315-2380-0
DOIs
Publication statusPublished - 14 Sept 2025
Event32nd IEEE International Conference on Image Processing, ICIP 2025 - Anchorage, United States
Duration: 14 Sept 202517 Sept 2025

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference32nd IEEE International Conference on Image Processing, ICIP 2025
Abbreviated titleICIP 2025
Country/TerritoryUnited States
CityAnchorage
Period14 Sept 202517 Sept 2025

Keywords

  • feature compression
  • video coding
  • Video coding for machines

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition

Cite this