DA-bench

Visual Benchmark for Data Analytics AI Agents

Run #16


DA-bench Results for Run #16 — Overall Score: 51.1% (115 / 225)

Data Querying (88 / 115)
Question Date Tested Score Video Recording

dq01
Perform an aggregation on an explicit column

June 28, 2024 - 8:47AM 5
 

dq02
Perform an aggregation with an explicit table but not an inferred column

June 28, 2024 - 8:53AM 5
 

dq03
Perform an aggregation with an implicit table and implicit column

June 28, 2024 - 8:57AM 5
 

dq04
Find and compare information across tables without joins

June 28, 2024 - 9:02AM 5
 

dq05
Work with non-literal values

June 28, 2024 - 9:08AM 5
 

dq06
Work with non-literal values and non-SQL data manipulation

June 28, 2024 - 9:12AM 4
 

dq07
Deal with common acronymns and more advanced aggregations

June 28, 2024 - 9:20AM 0
 

dq08
Multi-step queries

June 28, 2024 - 9:25AM 4
 

dq09
Aggregations with numeric predicates to filter

June 28, 2024 - 9:31AM 5
 

dq10
Aggregations with categorical predicates to filter

June 28, 2024 - 9:35AM 5
 

dq11
Schema review

June 28, 2024 - 9:40AM 5
 

dq12
Aggregate records that are filtered with a predicate requiring a join

June 28, 2024 - 9:45AM 0
 

dq13
Recognizes truly ambiguous queries.

June 28, 2024 - 10:05AM 0
 

dq14
Can handle boolean features

June 28, 2024 - 10:15AM 5
 

dq15
Handles ambiguous column names

June 28, 2024 - 10:19AM 0
 

dq16
Understands set operates require consideration of overlap

June 28, 2024 - 10:24AM 0

dq17
Finds relevant values inside a Column to answer questions

June 28, 2024 - 10:36AM 5

dq18
Lookup a single record by ID

June 28, 2024 - 10:42AM 5

dq19
Perform an aggregation by a different name and a second query from that

June 28, 2024 - 10:48AM 5

dq20
Perform an aggregation based on a very different question name

June 28, 2024 - 10:51AM 5

dq21
Perform a filter and an unusually-phrased aggregation in the correct order

June 28, 2024 - 10:55AM 5

dq22
Complex query using the same table multiple times in different ways

June 28, 2024 - 11:02AM 5

dq23
Can handle typos well

June 28, 2024 - 11:10AM 5
Feature Engineering (0 / 35)
Question Date Tested Score Video Recording

fe1
Make an indicator feature for a criteria set

June 28, 2024 - 11:39AM 0

fe2
Make a categorical feature from a criteria set

June 28, 2024 - 11:51AM 0

fe3
Minmax normalization

June 28, 2024 - 12:06AM 0

fe4
Combining two input columns

June 28, 2024 - 12:16AM 0

fe5
Sentiment

June 28, 2024 - 12:23AM 0

fe6
Phrase Identification in Text

June 28, 2024 - 12:28AM 0

fe7
Advanced NLQ

June 28, 2024 - 12:33AM 0
Insight Identification (22 / 45)
Question Date Tested Score Video Recording

ii1
Basic correlation

June 28, 2024 - 12:41AM 2

ii2
Compare an aggregation for two distinct subsets of data

June 28, 2024 - 12:57AM 3

ii3
Basic Correlation but requiring aggregation from another table

June 28, 2024 - 1:07AM 0

ii4
Outlier identification

June 28, 2024 - 1:14AM 3

ii5
Identifying basic trends on short timelines

June 28, 2024 - 1:20AM 5

ii6
Understands statistical significance

June 28, 2024 - 1:25AM 4

ii7
Understands derivitives

June 28, 2024 - 1:32AM 0

ii8
Can use NLQ feature engineering as part of an insight request

June 28, 2024 - 1:40AM 5

ii9
Identifying data discrepancies given guidance

June 28, 2024 - 1:48AM 0
Learning (0 / 10)
Question Date Tested Score Video Recording

l1
Can remember the meanings of oddly-named columns

June 28, 2024 - 1:55AM 0

l2
Can remember criteria sets under a single name

June 28, 2024 - 2:00AM 0
Visualization (0 / 15)
Question Date Tested Score Video Recording

v1
Basic Charting

June 28, 2024 - 2:05AM 0

v2
Charting with two series

June 28, 2024 - 2:14AM 0

v3
Categorical Charts

June 28, 2024 - 2:20AM 0
Domain Knowledge (5 / 5)
Question Date Tested Score Video Recording

dk01
Column Relevance Determination

June 28, 2024 - 8:35AM 5