STAR: A Benchmark for Situated Reasoning in Real-World Videos

What is STAR?

Reasoning in the real world is not divorced from situations. A key challenge is to capture the present knowledge from surrounding situations and reason accordingly. STAR is a novel benchmark for Situated Reasoning, which provides challenging question-answering tasks, symbolic situation descriptions and logic-grounded diagnosis via real-world video situations.

Welcome to evaluate your models on the STAR Evaluation and check results on the STAR Challenge Leaderboard.

Preview

The dataset consists of four question types for situated reasoning: Interaction, Sequence, Prediction, and Feasibility. Video situations are decomposed by bottom-up hyper-graphs with atomic entities and relations (e. g., actions, objects, and relationships). Questions are procedurally generated using functional programs based on the situation hyper-graphs.

More Examples

Download & Repository

STAR Overview

Question Types:

Interaction Question
Sequence Question
Predictive Question
Feasibility Question

22K Situation Video Clips

60K Situated Questions

140K Situation Hypergraphs

Annotation Statistics

111 action classes
37 entity classes
24 relationship classes

Data Download

Questions, Answers and Situation Graphs

Train json Val json Test json
Train/Val/Test Split File json

Question-Answer Templates and Programs

Question Templates csv
QA Programs csv

Situation Video Data

Video Segments csv
Video Keyframe IDs csv
Raw Videos from Charades(scaled to 480p) mp4 Keyframe Dumping Tool from Action Genome

Annotations

Classes Files zip
Object Bounding Boxes pkl
Human Poses zip
Human Bounding Boxes pkl

Download from Baidu Yunpan (百度云盘)

Data Download Access Code: 6v8u

STAR Codes and Scripts

The code of the STAR benchmark is available on GitHub. With this code you can:

Visualize the STAR questions, options, and situation graphs

QA Visualization Script

Generate new STAR questions for situations

QA Generation Code

Paper

Link to Paper

@inproceedings{wu2021star_situated_reasoning, author = {Wu, Bo and Yu, Shoubin and Chen, Zhenfang, Tenenbaum, Joshua B and Gan, Chuang}, title = {STAR: A Benchmark for Situated Reasoning in Real-World Videos}, booktitle = {Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS)}, year = {2021} }