DA-bench

Visual Benchmark for Data Analytics AI Agents

Run #18


DA-bench Results for Run #18 — Overall Score: 33.2% (73 / 220)

Data Querying (40 / 115)
Question Date Tested Score Video Recording

dq01
Perform an aggregation on an explicit column

July 4, 2024 - 08:43 AM 4

dq02
Perform an aggregation with an explicit table but not an inferred column

July 4, 2024 - 08:51 AM 4

dq03
Perform an aggregation with an implicit table and implicit column

July 4, 2024 - 08:58 AM 0

dq04
Find and compare information across tables without joins

July 4, 2024 - 09:06 AM 0

dq05
Work with non-literal values

July 4, 2024 - 09:10 AM 5

dq06
Work with non-literal values and non-SQL data manipulation

July 4, 2024 - 09:14 AM 5

dq07
Deal with common acronymns and more advanced aggregations

July 4, 2024 - 09:16 AM 0

dq08
Multi-step queries

July 4, 2024 - 09:23 AM 3

dq09
Aggregations with numeric predicates to filter

July 4, 2024 - 09:26 AM 0

dq10
Aggregations with categorical predicates to filter

July 4, 2024 - 09:29 AM 0

dq11
Schema review

July 4, 2024 - 09:34 AM 5

dq12
Aggregate records that are filtered with a predicate requiring a join

July 4, 2024 - 09:41 AM 0

dq13
Recognizes truly ambiguous queries.

July 4, 2024 - 09:44 AM 0

dq14
Can handle boolean features

July 4, 2024 - 09:48 AM 0

dq15
Handles ambiguous column names

July 4, 2024 - 09:52 AM 0

dq16
Understands set operates require consideration of overlap

July 4, 2024 - 09:54 AM 0

dq17
Finds relevant values inside a Column to answer questions

July 4, 2024 - 10:04 AM 0

dq18
Lookup a single record by ID

July 4, 2024 - 10:08 AM 5

dq19
Perform an aggregation by a different name and a second query from that

July 4, 2024 - 10:12 AM 3

dq20
Perform an aggregation based on a very different question name

July 4, 2024 - 10:15 AM 3

dq21
Perform a filter and an unusually-phrased aggregation in the correct order

July 4, 2024 - 10:18 AM 0

dq22
Complex query using the same table multiple times in different ways

July 4, 2024 - 10:21 AM 0

dq23
Can handle typos well

July 4, 2024 - 10:28 AM 3
Feature Engineering (27 / 35)
Question Date Tested Score Video Recording

fe1
Make an indicator feature for a criteria set

July 4, 2024 - 10:30 AM 5

fe2
Make a categorical feature from a criteria set

July 4, 2024 - 10:38 AM 5

fe3
Minmax normalization

July 4, 2024 - 10:45 AM 5

fe4
Combining two input columns

July 4, 2024 - 10:49 AM 3

fe5
Sentiment

July 4, 2024 - 10:53 AM 3

fe6
Phrase Identification in Text

July 4, 2024 - 11:05 AM 3

fe7
Advanced NLQ

July 4, 2024 - 11:11 AM 3
Insight Identification (6 / 40)
Question Date Tested Score Video Recording

ii1
Basic correlation

July 4, 2024 - 11:15 AM 0

ii2
Compare an aggregation for two distinct subsets of data

July 4, 2024 - 11:26 AM 3

ii3
Basic Correlation but requiring aggregation from another table

July 4, 2024 - 11:34 AM 0

ii4
Outlier identification

July 4, 2024 - 11:40 AM 0

ii5
Identifying basic trends on short timelines

July 4, 2024 - 11:46 AM 0

ii6
Understands statistical significance

July 4, 2024 - 11:51 AM 3

ii7
Understands derivitives

July 4, 2024 - 11:58 AM 0

ii8
Can use NLQ feature engineering as part of an insight request

July 4, 2024 - 12:05 AM 0
Learning (0 / 10)
Question Date Tested Score Video Recording

l1
Can remember the meanings of oddly-named columns

July 4, 2024 - 12:08 AM 0

l2
Can remember criteria sets under a single name

July 4, 2024 - 12:16 AM 0
Visualization (0 / 15)
Question Date Tested Score Video Recording

v1
Basic Charting

July 4, 2024 - 12:22 AM 0

v2
Charting with two series

July 4, 2024 - 12:28 AM 0

v3
Categorical Charts

July 4, 2024 - 12:35 AM 0
Domain Knowledge (0 / 5)
Question Date Tested Score Video Recording

dk01

July 4, 2024 - 08:35 AM 0