DA-bench

Visual Benchmark for Data Analytics AI Agents

cortexRun #149 (December 4, 2024)

Overall Score: 23.7%
(40% Setup Score + 60% Test Score)

DA-bench Setup for Run #149 — Setup Score: 0.0 (0 / 6)

See how this tool was set up for the test.

  • Unchecked Connects to Data Warehouse
  • Unchecked No Individual Upload of Files
  • Unchecked No SQL for Setup
  • Unchecked Setup Less Than Ten Minutes

DA-bench Results for Run #149 — Test Score: 39.5 (85 / 215)

Data Querying (75 / 120)
Question Date Tested Overall Score Video Recording

dq01
Perform an aggregation on an explicit column

December 4, 2024 5

dq02
Perform an aggregation with an explicit table but not an inferred column

December 4, 2024 5

dq03
Perform an aggregation with an implicit table and implicit column

December 4, 2024 5

dq04
Find and compare information across tables without joins

December 4, 2024 -5

dq05
Work with non-literal values

December 4, 2024 5

dq06
Work with non-literal values and non-SQL data manipulation

December 4, 2024 5

dq07
Deal with common acronymns and more advanced aggregations

December 4, 2024 5

dq08
Multi-step queries

December 4, 2024 0

dq09
Aggregations with numeric predicates to filter

December 4, 2024 5

dq10
Aggregations with categorical predicates to filter

December 4, 2024 5

dq11
Schema review

December 4, 2024 5

dq12
Aggregate records that are filtered with a predicate requiring a join

December 4, 2024 5

dq13
Recognizes truly ambiguous queries.

December 4, 2024 -5

dq14
Can handle boolean features

December 4, 2024 5

dq15
Handles ambiguous column names

December 4, 2024 5

dq16
Understands set operates require consideration of overlap

December 4, 2024 5

dq17
Finds relevant values inside a Column to answer questions

December 4, 2024 5

dq18
Lookup a single record by ID

December 4, 2024 5

dq19
Perform an aggregation by a different name and a second query from that

December 4, 2024 5

dq20
Perform an aggregation based on a very different question name

December 4, 2024 5

dq21
Perform a filter and an unusually-phrased aggregation in the correct order

December 4, 2024 0

dq23
Can handle incorrect column names well

December 4, 2024 5

dq24
Schema review

December 4, 2024 0

dq25
Work with non-literal values

December 4, 2024 -5
Domain Knowledge (5 / 5)
Question Date Tested Overall Score Video Recording

dk01
Column Relevance Determination

December 4, 2024 5
Feature Engineering (-10 / 40)
Question Date Tested Overall Score Video Recording

fe1
Make a boolean indicator feature for a criteria set

December 4, 2024 5

fe2
Make a categorical feature from a criteria set

December 4, 2024 0

fe3
Minmax normalization

December 4, 2024 -5

fe4
Combining two input columns

December 4, 2024 -5

fe5
Sentiment

December 5, 2024 -5

fe6
Phrase Identification in Text

December 4, 2024 0

fe7
Advanced NLQ

December 5, 2024 0

fe8
Advanced NLQ

December 4, 2024 0
Insight Identification (10 / 25)
Question Date Tested Overall Score Video Recording

ii2
Compare an aggregation for two distinct subsets of data

December 4, 2024 5

ii5
Identifying basic trends on short timelines

December 4, 2024 5

ii6
Understands statistical significance

December 4, 2024 0

ii7
Understands derivitives

December 4, 2024 5

ii8
Can use NLQ feature engineering as part of an insight request

December 4, 2024 -5
Learning (0 / 10)
Question Date Tested Overall Score Video Recording

l1
Can remember the meanings of oddly-named columns

December 4, 2024 -5

l2
Can remember criteria sets under a single name

December 4, 2024 5
Visualization (5 / 15)
Question Date Tested Overall Score Video Recording

v1
Basic Charting

December 4, 2024 5

v2
Charting with two series

December 4, 2024 0

v3
Categorical Charts

December 4, 2024 0