DA-bench

Visual Benchmark for Data Analytics AI Agents

databricksRun #101 (2024-10-28)

Overall Score: 50.0%
(40% Scalability Score + 60% Test Score)

DA-bench Setup for Run #101 — Scalability Score: 100.0% (7 / 7)

See how this tool was set up for the test.

  • Verified Setup Less Than 20 Minutes
  • Verified Connects to Data Warehouse
  • Verified Handles 1TB Table
  • Verified Handles 10+ Tables
  • Verified No Table Structure Changes
  • Verified No SQL Expertise for Setup

DA-bench Results for Run #101 — Test Score: 16.6% (34 / 205)

Data Querying (27 / 120)
14 Correct Answers, 4 Hallucinations
Question Date Tested Overall Score Video Recording

dq01
Perform an aggregation on an explicit column

2024-10-28 4

dq02
Perform an aggregation with an explicit table but not an inferred column

2024-10-28 4

dq03
Perform an aggregation with an implicit table and implicit column

2024-10-28 4

dq04
Find and compare information across tables without joins

2024-10-28 2

dq05
Work with non-literal values

2024-10-28 4

dq06
Work with non-literal values and non-SQL data manipulation

2024-10-28 4

dq07
Deal with common acronymns and more advanced aggregations

2024-10-28 4

dq08
Multi-step queries

2024-10-28 -5

dq09
Aggregations with numeric predicates to filter

2024-10-28 3

dq10
Aggregations with categorical predicates to filter

2024-10-28 4

dq11
Schema review

2024-10-28 0

dq12
Aggregate records that are filtered with a predicate requiring a join

2024-10-28 0

dq13
Recognizes truly ambiguous queries.

2024-10-28 -5

dq14
Can handle boolean features

2024-10-28 -5

dq15
Handles ambiguous column names

2024-10-28 0

dq16
Understands set operates require consideration of overlap

2024-10-28 -5

dq17
Finds relevant values inside a Column to answer questions

2024-10-28 0

dq18
Lookup a single record by ID

2024-10-28 0

dq19
Perform an aggregation by a different name and a second query from that

2024-10-28 3

dq20
Perform an aggregation based on a very different question name

2024-10-28 3

dq21
Perform a filter and an unusually-phrased aggregation in the correct order

2024-10-28 1

dq23
Can handle incorrect column names well

2024-10-28 4

dq24
Schema review

2024-10-28 0

dq25
Work with non-literal values

2024-10-28 3
Feature Engineering (4 / 35)
4 Correct Answers, 2 Hallucinations
Question Date Tested Overall Score Video Recording

fe1
Make a boolean indicator feature for a criteria set

2024-10-28 3

fe2
Make a categorical feature from a criteria set

2024-10-28 5

fe4
Combining two input columns

2024-10-28 4

fe5
Sentiment

2024-11-01 -5

fe6
Phrase Identification in Text

2024-10-28 2

fe7
Advanced NLP

2024-10-28 -5

fe8
Advanced NLP

2024-10-28 0
Insight Identification (4 / 25)
2 Correct Answers, 0 Hallucinations
Question Date Tested Overall Score Video Recording

ii2
Compare an aggregation for two distinct subsets of data

2024-10-28 1

ii5
Identifying basic trends on short timelines

2024-10-28 0

ii6
Understands statistical significance

2024-10-28 0

ii7
Understands derivitives

2024-10-28 0

ii8
Can use NLP feature engineering as part of an insight request

2024-10-28 3
Learning (2 / 10)
1 Correct Answer, 0 Hallucinations
Question Date Tested Overall Score Video Recording

l1
Can remember the meanings of oddly-named columns

2024-10-28 2

l2
Can remember criteria sets under a single name

2024-10-28 0
Visualization (-3 / 15)
2 Correct Answers, 0 Hallucinations
Question Date Tested Overall Score Video Recording

v1
Basic Charting

2024-10-29 0

v2
Charting with two series

2024-10-29 -1

v3
Categorical Charts

2024-10-29 -2