News: Our paper was accepted by NeurIPS 2021, and we will release the dataset and the paper soon.

A Benchmark for Situated Reasoning in Real-World Videos

What is STAR?

Reasoning in the real world is not divorced from situations. A key challenge is to capture the present knowledge from surrounding situations and reason accordingly. STAR is a novel benchmark for Situated Reasoning, which provides challenging question-answering tasks, symbolic situation descriptions and logic-grounded diagnosis via real-world video situations.

You are welcome to use STAR Challenge leaderboard for evaluation.


The dataset consists of four question types for situated reasoning: Interaction, Sequence, Prediction and Feasibility. Video situations are decomposed by bottom-up hyper-graphs with atomic entities and relations (e. g., actions, objects, and relationships). Questions are procedurally generated using functional programs based on the situation hyper-graphs.

More Examples

Download & Repository

STAR Overview

Question Types:

  • Interaction Question
  • Sequence Question
  • Predictive Question
  • Feasibility Question

23K Situation Video Clips

61K Situated Questions

141K Situation Hypergraphs

Annotation Statistics

  • 111 action classes
  • 29 entity classes
  • 24 relationship classes

Data Download

Questions, Answers and Situation Graphs

Train json Val json Test json
Train/Val/Test Split File json

Question-Answer Templates and Programs

Question Templates csv
QA Programs csv

Situation Video Data

Video Segments csv
Video Keyframe IDs csv
Charades Videos (scaled to 480p) mp4 ActionGenome Frame Dumping Tool


Classes Files zip
Object Bounding Boxes pkl
Human Poses zip
Human Bounding Boxes pkl

STAR Codes and Scripts

The code of the STAR benchmark is available on GitHub. With this code you can:

Visualize the STAR questions, options, situation graphs

QA Visualization Script

Generate new STAR questions for situation video clips

QA Generation Code


The paper will be available soon.

Link to Paper
@inproceedings{wu2021star_situated_reasoning, author = {Wu, Bo and Yu, Shoubin and Chen, Zhenfang, Tenenbaum, Joshua B and Gan, Chuang}, title = {STAR: A Benchmark for Situated Reasoning in Real-World Videos}, booktitle = {Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS)}, year = {2021} }


Bo Wu
MIT-IBM Watson AI Lab

Shoubin Yu
Shanghai Jiao Tong University

Zhenfang Chen
MIT-IBM Watson AI Lab

Joshua B. Tenenbaum

Chuang Gan
MIT-IBM Watson AI Lab