DA-bench

Visual Benchmark for Data Analytics AI Agents

Run #15


DA-bench Results for Run #15 — Overall Score: 27.1% (57 / 210)

Data Querying (40 / 105)
Question Date Tested Score Video Recording

dq01
Perform an aggregation on an explicit column

June 13, 2024 - 11:50 AM 3
 

dq02
Perform an aggregation with an explicit table but not an inferred column

June 13, 2024 - 12:35AM 5
 

dq03
Perform an aggregation with an implicit table and implicit column

June 13, 2024 - 12:40AM 0
 

dq04
Find and compare information across tables without joins

June 13, 2024 - 12:45AM 0
 

dq05
Work with non-literal values

June 13, 2024 - 12:49AM 5
 

dq06
Work with non-literal values and non-SQL data manipulation

June 13, 2024 - 12:52AM 5
 

dq07
Deal with common acronymns and more advanced aggregations

June 13, 2024 - 12:54AM 5
 

dq08
Multi-step queries

June 13, 2024 - 12:59AM 5
 

dq09
Aggregations with numeric predicates to filter

June 13, 2024 - 1:01AM 0
 

dq10
Aggregations with categorical predicates to filter

June 13, 2024 - 1:04AM 0
 

dq11
Schema review

June 13, 2024 - 1:12AM 5
 

dq12
Aggregate records that are filtered with a predicate requiring a join

June 13, 2024 - 1:20AM 0
 

dq13
Recognizes truly ambiguous queries.

June 13, 2024 - 1:35AM 0
 

dq14
Can handle boolean features

June 13, 2024 - 1:46AM 0
 

dq15
Handles ambiguous column names

June 13, 2024 - 1:50AM 0
 

dq16
Understands set operates require consideration of overlap

June 13, 2024 - 1:55AM 0
 

dq17
Finds relevant values inside a Column to answer questions

June 13, 2024 - 1:59AM 0
 

dq18
Lookup a single record by ID

June 13, 2024 - 2:01AM 3
 

dq19
Perform an aggregation by a different name and a second query from that

June 13, 2024 - 2:05AM 0
 

dq20
Perform an aggregation based on a very different question name

June 13, 2024 - 2:08AM 4
 

dq21
Perform a filter and an unusually-phrased aggregation in the correct order

June 13, 2024 - 2:10AM 0
 
Visualization (0 / 15)
Question Date Tested Score Video Recording

v1
Basic Charting

June 13, 2024 - 2:15AM 0
 

v2
Charting with two series

June 13, 2024 - 2:20AM 0
 

v3
Categorical Charts

June 13, 2024 - 2:28AM 0
 
Feature Engineering (14 / 35)
Question Date Tested Score Video Recording

fe1
Make an indicator feature for a criteria set

June 13, 2024 - 2:40AM 3
 

fe2
Make a categorical feature from a criteria set

June 13, 2024 - 3:07AM 0
 

fe3
Minmax normalization

June 13, 2024 - 3:12AM 0
 

fe4
Combining two input columns

June 13, 2024 - 3:16AM 0
 

fe5
Sentiment

June 13, 2024 - 3:20AM 3
 

fe6
Phrase Identification in Text

June 13, 2024 - 3:36AM 3
 

fe7
Advanced NLQ

June 13, 2024 - 3:46AM 5
 
Insight Identification (3 / 40)
Question Date Tested Score Video Recording

ii1
Basic correlation

June 13, 2024 - 3:53AM 0
 

ii2
Compare an aggregation for two distinct subsets of data

June 13, 2024 - 4:01AM 0
 

ii3
Basic Correlation but requiring aggregation from another table

June 13, 2024 - 4:10AM 0
 

ii4
Outlier identification

June 13, 2024 - 4:15AM 3
 

ii5
Identifying basic trends on short timelines

June 13, 2024 - 4:20AM 0
 

ii6
Understands statistical significance

June 13, 2024 - 4:24AM 0
 

ii7
Understands derivitives

June 13, 2024 - 4:28AM 0
 

ii8
Can use NLQ feature engineering as part of an insight request

June 13, 2024 - 4:30AM 0
 
Learning (0 / 10)
Question Date Tested Score Video Recording

l1
Can remember the meanings of oddly-named columns

June 13, 2024 - 4:35AM 0
 

l2
Can remember criteria sets under a single name

June 13, 2024 - 4:39AM 0
 
Domain Knowledge (0 / 5)
Question Date Tested Score Video Recording

dk01
Column Relevance Determination

June 13, 2024 - 4:45AM 0