Modern digital cameras generally count on image signal processing (ISP) pipelines for producing naturalistic RGB images. Nevertheless, in comparison to DSLR cameras, low-quality images are generally output from portable mobile devices due to their physical limitations. The synthesized low-quality images usually have multiple degradations - low-resolution owing to small camera sensors, mosaic patterns on account of camera filter array and subpixel shifts due to camera motion. Such degradation usually restrain the performance of single image super-resolution methodologies for retrieving high-resolution (HR) image from a single low-resolution (LR) image. Burst image super-resolution aims at restoring a photo-realistic HR image by capturing the abundant information from multiple LR images. Lately, the soaring popularity of burst photography has made multi-frame processing an attractive solution for overcoming the limitations of single image processing. In our work, we thus aim to propose a generic architecture, adaptive feature consolidation network (AFCNet) for multi-frame processing. To alleviate the challenge of effectively modelling the long-range dependency problem, that multi-frame approaches struggle to solve, we utilize encoder-decoder based transformer backbone which learns multi-scale local-global representations. We propose feature alignment module to align LR burst frame features. Further, the aligned features are fused and reconstructed by abridged pseudo-burst fusion module and adaptive group upsampling modules, respectively. Our proposed approach clearly outperforms the other existing state-of-the-art techniques on benchmark datasets. The experimental results illustrate the effectiveness and generality of our proposed framework in upgrading the visual quality of HR images. © 2022 IEEE.
- Computer vision,
- Pipeline processing systems