Kohsuke Ide

I'm a Research Scientist at AIST and a Master's student in Computer Science at University of Tsukuba, where I work at Satoh Lab. My research focuses on computer vision, particularly the intersection of Vision-Language Models (VLMs) and 3D representation.

I received my B.Sc. in Applied Mechanics and Aerospace Engineering (Minor: Computer Science) from Waseda University. Previously, I worked at Preferred Networks on 3D reconstruction and free-viewpoint video technologies.

Email / CV / Scholar / LinkedIn / Github / X

Research

I'm interested in computer vision, 3D understanding, and vision-language models. My research focuses on learning 3D representations and understanding spatial relationships using large language models. Some papers are highlighted.

	Beyond Single Object: Learning 3D Relations with Large Language Models Kohsuke Ide, Ryousuke Yamada, Yue Qiu, Xianzheng Ma, Yoshihiro Fukuhara, Hirokatsu Kataoka, Yutaka Satoh The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026 project page / arXiv / code / press release (jp) Investigating how large language models can learn and reason about 3D spatial relationships between multiple objects.
	3D sans 3D Scans: Scalable Pre-training from Video-Generated Point Clouds Ryousuke Yamada, Kohsuke Ide, Yoshihiro Fukuhara, Hirokatsu Kataoka, Gilles Puy, Andrei Bursuc, Yuki M. Asano The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026 project page / arXiv / code Scalable 3D pre-training approach that generates point clouds from videos, eliminating the need for expensive 3D scans.
	Seeing Red, Thinking Bad: Color Bias in Vision Language Models Kohsuke Ide, Ryousuke Yamada, Yoshihiro Fukuhara, Hirokatsu Kataoka, Yutaka Satoh International Conference on Pattern Recognition (ICPR), 2026 Investigating how color biases in vision language models affect their reasoning and decision-making.
	Colors You Can't See, Semantic Biases You Can't Ignore Kohsuke Ide, Ryousuke Yamada, Yoshihiro Fukuhara, Hirokatsu Kataoka, Yutaka Satoh The IEEE/CVF International Conference on Computer Vision (ICCV) Workshop on MMRAgI, 2025 Studying the sensitivity of vision-language models to visual styling of text.

Experience

National Institute of Advanced Industrial Science and Technology (AIST)
Researcher, Jan 2024 - Present
Computer vision and pattern analysis research, focusing on VLMs and 3D representation.

Preferred Networks (PFN)
Researcher, Aug 2024 - Mar 2025
3D reconstruction and free-viewpoint video technologies.

LightBlue Technology
ML Engineer, Sep 2021 - Mar 2024
Developed light-weight object tracking algorithms for edge devices.

M3
MLOps Engineer, Sep 2023 - Oct 2023
Developed open-source solution for automatic Kubernetes OOM recovery.

Awards & Projects

MIRU 2025 Interactive Presentation Award (Top 4%) - "Can 3D Large Language Models Count to Three?"

Winner of 2023 Hackathon at WINC - Pushups Counter: Work out helper using computer vision for edge device.

Academic Service

Workshop Organizer - Workshop on Visual General Intelligence: Vision Research Toward the AGI Era, CVPR 2026

Template from Jon Barron's website.