DA-bench

Visual Benchmark for Data Analytics AI Agents

unsupervisedRun #216 (2025-02-14)

Overall Score: 72.1%
(40% Scalability Score + 60% Test Score)

DA-bench Setup for Run #216 — Scalability Score: 100.0% (7 / 7)

See how this tool was set up for the test.

  • Verified Setup Less Than 20 Minutes
  • Verified Connects to Data Warehouse
  • Verified Handles 1TB Table
  • Verified Handles 10+ Tables
  • Verified No Table Structure Changes
  • Verified No SQL Expertise for Setup

DA-bench Results for Run #216 — Test Score: 53.6% (150 / 280)

Data Querying (130 / 170)
27 Correct Answers, 1 Hallucination
Question Date Tested Overall Score Video Recording

dq01
Perform an aggregation on an explicit column

2025-02-14 5

dq02
Perform an aggregation with an explicit table but not an inferred column

2025-02-15 5

dq03
Perform an aggregation with an implicit table and implicit column

2025-02-16 5

dq04
Find and compare information across tables without joins

2025-02-17 5

dq05
Work with non-literal values

2025-02-18 5

dq06
Work with non-literal values and non-SQL data manipulation

2025-02-19 5

dq07
Deal with common acronymns and more advanced aggregations

2025-02-20 5

dq08
Multi-step queries

2025-02-21 5

dq09
Aggregations with numeric predicates to filter

2025-02-22 5

dq10
Aggregations with categorical predicates to filter

2025-02-23 5

dq11
Schema review

2025-02-24 5

dq12
Aggregate records that are filtered with a predicate requiring a join

2025-02-25 0

dq13
Recognizes truly ambiguous queries.

2025-02-26 5

dq14
Can handle boolean features

2025-02-27 5

dq15
Handles ambiguous column names

2025-02-28 5

dq16
Understands set operates require consideration of overlap

2025-03-01 5

dq17
Finds relevant values inside a Column to answer questions

2025-03-02 5

dq18
Lookup a single record by ID

2025-03-03 5

dq19
Perform an aggregation by a different name and a second query from that

2025-03-04 5

dq20
Perform an aggregation based on a very different question name

2025-03-05 5

dq21
Perform a filter and an unusually-phrased aggregation in the correct order

2025-03-06 0

dq22
Complex query using the same table multiple times in different ways

2025-03-07 0

dq23
Can handle incorrect column names well

2025-03-08 5

dq24
Schema review

2025-03-09 5

dq25
Work with non-literal values

2025-03-10 5

dq26
Multiple filters for the same tables but separate joins

2025-03-11 -5

dq27
Multiple joins for the same tables across filters and stats

2025-03-12 5

dq28
Multiple joins for the same tables across filters and stats

2025-03-13 5

dq29
Complex query using window function

2025-03-14 0

dq30
Perform on different perspectives

2025-03-15 5

dq31
Month extraction and aggregation

2025-03-16 0

dq32
Handles typos with transposed and missing letters

2025-03-17 5

dq33
Negative answer case

2025-03-18 5

dq34
Negative answer case

2025-03-19 0
Domain Knowledge (5 / 5)
1 Correct Answer, 0 Hallucinations
Question Date Tested Overall Score Video Recording

dk01
Column Relevance Determination

2025-03-20 5
Feature Engineering (5 / 40)
3 Correct Answers, 2 Hallucinations
Question Date Tested Overall Score Video Recording

fe1
Make a boolean indicator feature for a criteria set

2025-03-21 5

fe2
Make a categorical feature from a criteria set

2025-03-22 0

fe3
Minmax normalization

2025-03-23 5

fe4
Combining two input columns

2025-03-24 5

fe5
Sentiment

2025-03-25 0

fe6
Phrase Identification in Text

2025-03-26 -5

fe7
Advanced NLP

2025-03-27 -5

fe8
Advanced NLP

2025-03-28 0
Insight Identification (10 / 40)
2 Correct Answers, 0 Hallucinations
Question Date Tested Overall Score Video Recording

ii2
Compare an aggregation for two distinct subsets of data

2025-03-29 5

ii5
Identifying basic trends on short timelines

2025-03-30 0

ii6
Understands statistical significance

2025-03-31 0

ii7
Understands derivitives

2025-04-01 5

ii8
Can use NLP feature engineering as part of an insight request

2025-04-02 0

ii10
Identifying data discrepancies given guidance

2025-04-03 0

ii12
Can manage planning aggregations in several tables

2025-04-04 0

ii15
Can align timestamps and filter a heavily joined table for sequenced data

2025-04-05 0
Learning (0 / 10)
1 Correct Answer, 1 Hallucination
Question Date Tested Overall Score Video Recording

l1
Can remember the meanings of oddly-named columns

2025-04-06 5

l2
Can remember criteria sets under a single name

2025-04-07 -5
Visualization (0 / 15)
0 Correct Answers, 0 Hallucinations
Question Date Tested Overall Score Video Recording

v1
Basic Charting

2025-04-08 0

v2
Charting with two series

2025-04-09 0

v3
Categorical Charts

2025-04-10 0