DA-bench

Visual Benchmark for Data Analytics AI Agents

cortexRun #180 (2025-01-07)

etup score reflects Snowflake's data warehouse functionality, although the setup video demonstrates uploading csvs. Setup also includes creating YAML and updating Streamlit code. These steps are not captured in the setup video.

Overall Score: 52.3%
(40% Setup Score + 60% Test Score)

DA-bench Setup for Run #180 — Setup Score: 71.4% (5 / 7)

See how this tool was set up for the test.

  • Verified Setup Less Than 20 Minutes
  • Verified Connects to Data Warehouse
  • Verified Handles 1TB Table
  • Verified No Individual Upload of Files
  • Unchecked No Python for Setup
  • Unchecked No SQL for Setup

DA-bench Results for Run #180 — Test Score: 39.6% (93 / 235)

Data Querying (95 / 125)
20 Correct Answers, 1 Hallucination
Question Date Tested Overall Score Video Recording

dq01
Perform an aggregation on an explicit column

2025-01-07 5

dq02
Perform an aggregation with an explicit table but not an inferred column

2025-01-07 5

dq03
Perform an aggregation with an implicit table and implicit column

2025-01-07 5

dq04
Find and compare information across tables without joins

2025-01-07 5

dq05
Work with non-literal values

2025-01-07 5

dq06
Work with non-literal values and non-SQL data manipulation

2025-01-07 5

dq07
Deal with common acronymns and more advanced aggregations

2025-01-07 5

dq08
Multi-step queries

2025-01-07 5

dq09
Aggregations with numeric predicates to filter

2025-01-07 5

dq10
Aggregations with categorical predicates to filter

2025-01-07 5

dq11
Schema review

2025-01-07 5

dq12
Aggregate records that are filtered with a predicate requiring a join

2025-01-07 -5

dq13
Recognizes truly ambiguous queries.

2025-01-07 0

dq14
Can handle boolean features

2025-01-07 5

dq15
Handles ambiguous column names

2025-01-07 5

dq16
Understands set operates require consideration of overlap

2025-01-07 5

dq17
Finds relevant values inside a Column to answer questions

2025-01-07 5

dq18
Lookup a single record by ID

2025-01-07 0

dq19
Perform an aggregation by a different name and a second query from that

2025-01-07 5

dq20
Perform an aggregation based on a very different question name

2025-01-07 5

dq21
Perform a filter and an unusually-phrased aggregation in the correct order

2025-01-07 5

dq22
Complex query using the same table multiple times in different ways

2025-01-07 0

dq23
Can handle incorrect column names well

2025-01-07 5

dq24
Schema review

2025-01-07 0

dq25
Work with non-literal values

2025-01-07 5
Domain Knowledge (0 / 5)
0 Correct Answers, 0 Hallucinations
Question Date Tested Overall Score Video Recording

dk01
Column Relevance Determination

2025-01-07 0
Feature Engineering (0 / 40)
1 Correct Answer, 1 Hallucination
Question Date Tested Overall Score Video Recording

fe1
Make a boolean indicator feature for a criteria set

2025-01-07 5

fe2
Make a categorical feature from a criteria set

2025-01-07 0

fe3
Minmax normalization

2025-01-07 -5

fe4
Combining two input columns

2025-01-07 0

fe5
Sentiment

2025-01-07 0

fe6
Phrase Identification in Text

2025-01-07 0

fe7
Advanced NLP

2025-01-07 0

fe8
Advanced NLP

2025-01-07 0
Insight Identification (0 / 40)
1 Correct Answer, 1 Hallucination
Question Date Tested Overall Score Video Recording

ii2
Compare an aggregation for two distinct subsets of data

2025-01-07 5

ii5
Identifying basic trends on short timelines

2025-01-07 0

ii6
Understands statistical significance

2025-01-07 0

ii7
Understands derivitives

2025-01-07 0

ii8
Can use NLP feature engineering as part of an insight request

2025-01-07 -5

ii10
Identifying data discrepancies given guidance

2025-01-07 0

ii12
Can manage planning aggregations in several tables

2025-01-07 0

ii15
Can align timestamps and filter a heavily joined table for sequenced data

2025-01-07 0
Learning (-10 / 10)
0 Correct Answers, 2 Hallucinations
Question Date Tested Overall Score Video Recording

l1
Can remember the meanings of oddly-named columns

2025-01-07 -5

l2
Can remember criteria sets under a single name

2025-01-07 -5
Visualization (8 / 15)
2 Correct Answers, 0 Hallucinations
Question Date Tested Overall Score Video Recording

v1
Basic Charting

2025-01-07 4

v2
Charting with two series

2025-01-07 0

v3
Categorical Charts

2025-01-07 4