Latest Publications

Automatic Sub-Task Focus: LifeInsight’s Contribution

In this paper, we introduce the automation approach of LifeInsight, a retrieval system designed explicitly for the NTCIR-17 Lifelog-5 Automatic Task, facilitating a seamless search experience and efficient data mining. Our method entails a two-fold process, where we first enrich the metadata from the raw query, followed by the composition of the retrieval method from input entities

NTCIR-17

Jan 1, 2023 ·

PDF

Lifelog Discovery Assistant: Suggesting Prompts and Indexing Event Sequences for FIRST

In this paper, we present the fourth iteration of our participating system FIRST. For this year, we adopt generative models to equip the system with predictive ability rather than entirely relying on the user to input the query. We also index a sequence of images as an event for improved search speed. Finally, we demonstrate how the additional features can assist users in searching.

Proceedings of the 6th Annual ACM Lifelog Search Challenge

Jan 1, 2023 ·

PDF

ANIMAR: Text/Sketch-based 3D Animal Fine-Grained Retrieval

This paper presents a novel SHREC challenge track focusing on text-based fine-grained retrieval of 3D animal models. Unlike previous SHREC challenge tracks, the proposed task is considerably more challenging, requiring participants to develop innovative approaches to tackle the problem of text-based retrieval

Computers & Graphics

Jan 1, 2023 ·

PDF

V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023

We present a new version of our interactive video retrieval system V-FIRST. Besides the existing features of querying by textual descriptions and visual examples, we propose the usage of an image generator that can generate images from a text prompt as a means to bridge the domain gap. We also include a novel referring expression segmentation module to highlight the objects in an image. This is the first step towards providing adequate explainability to retrieval results, ensuring that the system can be trusted and used in domain-specific and critical scenarios.

MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway

Jan 1, 2023 ·

PDF

We present a novel two-staged method that employs various 2D-based techniques to deal with the 3D segmentation task. In most of the previous challenges, it is unlikely for 2D CNNs to be comparable with other 3D CNNs since 2D models can hardly capture temporal information. In light of that, we propose using the recent state-of-the-art technique in video object segmentation, combining it with other semi-supervised training techniques to leverage the extensive unlabeled data...

MICCAI 2022 Challenge, FLARE

Jan 1, 2022 ·

PDF

Text query based traffic video event retrieval with global-local fusion embedding

Retrieving event videos based on textual description is a promising research topic in the fast-growing data field. However, traffic data increases every day, so it is essential to need intelligent traffic system management in conjunction with humans to speed up the search. We propose a multi-module system that delivers accurate results that meet objectives, including explainability and scalability at the same time...

IEEE/CVF Conference on Computer Vision and Pattern Recognition

Jan 1, 2022 ·

PDF

Pothole and crack detection in the road pavement using images and RGB-D data

This paper describes the methods submitted for evaluation to the SHREC 2022 track on pothole and crack detection in the road pavement. A total of 7 different runs for the semantic segmentation of the road surface are compared, 6 from the participants plus a baseline method. All methods exploit Deep Learning techniques and their performance is tested using the same environment ...

Computers & Graphics

Jan 1, 2022 ·

PDF

Facial Data De-identification with Adversarial Generation and Perturbation Methods

The 2021 MediaEval Multimedia Evaluation introduces a new data de-identification task, which goal is to explore methods for obscuring driver identity in driver-facing video recordings while maintaining visible human behavioral information. Interested in the challenge, our HCMUS team participate in searching for different ideas to tackle the problem...

MediaEval

Jan 1, 2021 ·

PDF

Retrieval of cultural heritage objects

This paper presents the methods and results of the SHREC’21 track on a dataset of cultural heritage (CH) objects. We present a dataset of 938 scanned models that have varied geometry and artistic styles. For the competition, we propose two challenges: the retrieval-by-shape challenge and the retrieval-by-culture challenge. The former aims at evaluating the ability of retrieval methods to discriminate cultural heritage objects by overall shape...

Computers & Graphics

Jan 1, 2021 ·

PDF

Efficient methods of Metadata Embedding and Augmentation for Visual Sentiment Analysis

The Visual Sentiment Analysis Task which is the new task in The Multimedia Evaluation 2021 Challenge concentrates on recognizing emotional responses to natural disaster images. Our team performs different approaches based on multiple pretrained models and many techniques to deal with 3 subtasks having a different set of labels for each one...

MediaEval

Jan 1, 2021 ·

PDF

Image-Text Fusion for Automatic News-Images Re-Matching

Matching text and images based on their semantics has an important role in cross-media retrieval. Especially, in terms of news, text and images connection is highly ambiguous. In the context of MediaEval 2020 Challenge, we propose three multi-modal methods for mapping text and images of news articles to the shared space in order to perform efficient cross-retrieval...

MediaEval

Jan 1, 2020 ·

PDF

Extended Monocular Image Based 3D Model Retrieval

Monocular image based 3D object retrieval has attracted more and more attentions in the field of 3D object retrieval. However, the research of 3D object retrieval based on 2D image is still challenging, mainly because of the gap between data from different modalities. To further support this research, we extend the previous track SHREC19'MI3DOR to organize this track, and we construct the expanded monocular image based 3D object retrieval benchmark...

The Eurographics Association

Jan 1, 2020 ·

PDF

Scene Category Protection with Back Propagation and Image Enhancement

Personal privacy is one of the essential problems in modern society. In some cases, people may not want smart computing systems to automatically identify and reveal their personal information, such as places or habits. This motivates our proposal to protect scene category recognition from photos by back-propagation...

MediaEval

Jan 1, 2019 ·

PDF

Multimodal Personal Health Lifelog Data Analysis with Inference from Multiple Sources and Attributes

When collecting and processing data recorded by sensors for any applications, noisy and missing data is an important problem that need to be address. This paper presents two approaches we use to predict missing air quality data ...

MediaEval

Jan 1, 2019 ·

PDF

Latest Publications

Automatic Sub-Task Focus: LifeInsight’s Contribution #

Lifelog Discovery Assistant: Suggesting Prompts and Indexing Event Sequences for FIRST #

ANIMAR: Text/Sketch-based 3D Animal Fine-Grained Retrieval #

V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023 #

Semi-supervised Organ Segmentation with Mask Propagation Refinement and Uncertainty Estimation for Data Generation #

Text query based traffic video event retrieval with global-local fusion embedding #

Pothole and crack detection in the road pavement using images and RGB-D data #

Facial Data De-identification with Adversarial Generation and Perturbation Methods #

Retrieval of cultural heritage objects #

Efficient methods of Metadata Embedding and Augmentation for Visual Sentiment Analysis #

Image-Text Fusion for Automatic News-Images Re-Matching #

Extended Monocular Image Based 3D Model Retrieval #

Scene Category Protection with Back Propagation and Image Enhancement #

Multimodal Personal Health Lifelog Data Analysis with Inference from Multiple Sources and Attributes #

Automatic Sub-Task Focus: LifeInsight’s Contribution

Lifelog Discovery Assistant: Suggesting Prompts and Indexing Event Sequences for FIRST

ANIMAR: Text/Sketch-based 3D Animal Fine-Grained Retrieval

V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023

Semi-supervised Organ Segmentation with Mask Propagation Refinement and Uncertainty Estimation for Data Generation

Text query based traffic video event retrieval with global-local fusion embedding

Pothole and crack detection in the road pavement using images and RGB-D data

Facial Data De-identification with Adversarial Generation and Perturbation Methods

Retrieval of cultural heritage objects

Efficient methods of Metadata Embedding and Augmentation for Visual Sentiment Analysis

Image-Text Fusion for Automatic News-Images Re-Matching

Extended Monocular Image Based 3D Model Retrieval

Scene Category Protection with Back Propagation and Image Enhancement

Multimodal Personal Health Lifelog Data Analysis with Inference from Multiple Sources and Attributes