G-DAIC: A Gaze Initialized Framework for Description and Aesthetic-Based Image Cropping

Nora Horanyi; Yuqi Hou; Ales Leonardis; Hyung Jin Chang

doi:10.1145/3591132

G-DAIC: A Gaze Initialized Framework for Description and Aesthetic-Based Image Cropping

Nora Horanyi, Yuqi Hou, Ales Leonardis, Hyung Jin Chang

Computer Science

Research output: Contribution to journal › Article › peer-review

56 Downloads (Pure)

Abstract

We propose a new gaze-initialised optimisation framework to generate aesthetically pleasing image crops based on user description. We extended the existing description-based image cropping dataset by collecting user eye movements corresponding to the image captions. To best leverage the contextual information to initialise the optimisation framework using the collected gaze data, this work proposes two gaze-based initialisation strategies, Fixed Grid and Region Proposal. In addition, we propose the adaptive Mixed scaling method to find the optimal output despite the size of the generated initialisation region and the described part of the image. We address the runtime limitation of the state-of-the-art method by implementing the Early termination strategy to reduce the number of iterations required to produce the output. Our experiments show that G-DAIC reduced the runtime by 92.11%, and the quantitative and qualitative experiments demonstrated that the proposed framework produces higher quality and more accurate image crops w.r.t. user intention.

Original language	English
Article number	163
Pages (from-to)	1-19
Number of pages	19
Journal	Proceedings of the ACM on Human-Computer Interaction
Volume	7
Issue number	ETRA
DOIs	https://doi.org/10.1145/3591132
Publication status	Published - 18 May 2023

Keywords

Eye-tracking
Gaze-based image cropping
Aesthetics
Deep network re-purposing
Image captioning

Access to Document

10.1145/3591132

HoranyiN2023GDAIC
© 2023 Copyright held by the owner/author(s). This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the ACM on Human-Computer Interaction, https://doi.org/10.1145/3591132
Accepted author manuscript, 3.34 MBLicence: Other (please specify with Rights Statement)

Cite this

@article{99d9f98eea9c471ea0393a818022ed69,

title = "G-DAIC: A Gaze Initialized Framework for Description and Aesthetic-Based Image Cropping",

abstract = "We propose a new gaze-initialised optimisation framework to generate aesthetically pleasing image crops based on user description. We extended the existing description-based image cropping dataset by collecting user eye movements corresponding to the image captions. To best leverage the contextual information to initialise the optimisation framework using the collected gaze data, this work proposes two gaze-based initialisation strategies, Fixed Grid and Region Proposal. In addition, we propose the adaptive Mixed scaling method to find the optimal output despite the size of the generated initialisation region and the described part of the image. We address the runtime limitation of the state-of-the-art method by implementing the Early termination strategy to reduce the number of iterations required to produce the output. Our experiments show that G-DAIC reduced the runtime by 92.11%, and the quantitative and qualitative experiments demonstrated that the proposed framework produces higher quality and more accurate image crops w.r.t. user intention.",

keywords = "Eye-tracking, Gaze-based image cropping, Aesthetics, Deep network re-purposing, Image captioning",

author = "Nora Horanyi and Yuqi Hou and Ales Leonardis and Chang, {Hyung Jin}",

year = "2023",

month = may,

day = "18",

doi = "10.1145/3591132",

language = "English",

volume = "7",

pages = "1--19",

journal = "Proceedings of the ACM on Human-Computer Interaction",

number = "ETRA",

}

TY - JOUR

T1 - G-DAIC

T2 - A Gaze Initialized Framework for Description and Aesthetic-Based Image Cropping

AU - Horanyi, Nora

AU - Hou, Yuqi

AU - Leonardis, Ales

AU - Chang, Hyung Jin

PY - 2023/5/18

Y1 - 2023/5/18

N2 - We propose a new gaze-initialised optimisation framework to generate aesthetically pleasing image crops based on user description. We extended the existing description-based image cropping dataset by collecting user eye movements corresponding to the image captions. To best leverage the contextual information to initialise the optimisation framework using the collected gaze data, this work proposes two gaze-based initialisation strategies, Fixed Grid and Region Proposal. In addition, we propose the adaptive Mixed scaling method to find the optimal output despite the size of the generated initialisation region and the described part of the image. We address the runtime limitation of the state-of-the-art method by implementing the Early termination strategy to reduce the number of iterations required to produce the output. Our experiments show that G-DAIC reduced the runtime by 92.11%, and the quantitative and qualitative experiments demonstrated that the proposed framework produces higher quality and more accurate image crops w.r.t. user intention.

AB - We propose a new gaze-initialised optimisation framework to generate aesthetically pleasing image crops based on user description. We extended the existing description-based image cropping dataset by collecting user eye movements corresponding to the image captions. To best leverage the contextual information to initialise the optimisation framework using the collected gaze data, this work proposes two gaze-based initialisation strategies, Fixed Grid and Region Proposal. In addition, we propose the adaptive Mixed scaling method to find the optimal output despite the size of the generated initialisation region and the described part of the image. We address the runtime limitation of the state-of-the-art method by implementing the Early termination strategy to reduce the number of iterations required to produce the output. Our experiments show that G-DAIC reduced the runtime by 92.11%, and the quantitative and qualitative experiments demonstrated that the proposed framework produces higher quality and more accurate image crops w.r.t. user intention.

KW - Eye-tracking

KW - Gaze-based image cropping

KW - Aesthetics

KW - Deep network re-purposing

KW - Image captioning

U2 - 10.1145/3591132

DO - 10.1145/3591132

M3 - Article

VL - 7

SP - 1

EP - 19

JO - Proceedings of the ACM on Human-Computer Interaction

JF - Proceedings of the ACM on Human-Computer Interaction

IS - ETRA

M1 - 163

ER -

G-DAIC: A Gaze Initialized Framework for Description and Aesthetic-Based Image Cropping

Abstract

Keywords

Access to Document

Fingerprint

Cite this