DA-bench

Visual Benchmark for Data Analytics AI Agents

Run #17

  • Tool: julius
  • Date Tested: 2024-06-09

DA-bench Results for Run #17 — Overall Score: 37.6% (79 / 210)

Data Querying (24 / 105)
Question Date Tested Score Video Recording

dq01
Perform an aggregation on an explicit column

June 9, 2024 - 12:34 AM 0

dq02
Perform an aggregation with an explicit table but not an inferred column

June 9, 2024 - 12:36 AM 0

dq03
Perform an aggregation with an implicit table and implicit column

June 9, 2024 - 12:37 AM 0

dq04
Find and compare information across tables without joins

June 9, 2024 - 12:40 AM 4

dq05
Work with non-literal values

June 9, 2024 - 12:44 AM 5

dq06
Work with non-literal values and non-SQL data manipulation

June 9, 2024 - 12:45 AM 5

dq07
Deal with common acronymns and more advanced aggregations

June 9, 2024 - 12:46 AM 0

dq08
Multi-step queries

June 9, 2024 - 12:47 AM 5

dq09
Aggregations with numeric predicates to filter

June 9, 2024 - 12:49 AM 0

dq10
Aggregations with categorical predicates to filter

June 9, 2024 - 12:51 AM 0

dq11
Schema review

June 9, 2024 - 12:52 AM 0

dq12
Aggregate records that are filtered with a predicate requiring a join

June 9, 2024 - 12:54 AM 0

dq13
Recognizes truly ambiguous queries.

June 9, 2024 - 12:55 AM 0

dq14
Can handle boolean features

June 9, 2024 - 12:57 AM 0

dq15
Handles ambiguous column names

June 9, 2024 - 01:00 AM 0

dq16
Understands set operates require consideration of overlap

June 9, 2024 - 01:03 AM 0

dq17
Finds relevant values inside a Column to answer questions

June 9, 2024 - 01:05 AM 0

dq18
Lookup a single record by ID

June 9, 2024 - 01:06 AM 5

dq19
Perform an aggregation by a different name and a second query from that

June 9, 2024 - 01:07 AM 0

dq20
Perform an aggregation based on a very different question name

June 9, 2024 - 01:09 AM 0

dq21
Perform a filter and an unusually-phrased aggregation in the correct order

June 9, 2024 - 01:12 AM 0
Visualization (0 / 15)
Question Date Tested Score Video Recording

v1
Basic Charting

June 9, 2024 - 01:14 AM 0

v2
Charting with two series

June 9, 2024 - 01:17 AM 0

v3
Categorical Charts

June 9, 2024 - 01:18 AM 0
Feature Engineering (30 / 35)
Question Date Tested Score Video Recording

fe1
Make an indicator feature for a criteria set

June 9, 2024 - 01:19 AM 0

fe2
Make a categorical feature from a criteria set

June 9, 2024 - 01:21 AM 5

fe3
Minmax normalization

June 9, 2024 - 01:22 AM 5

fe4
Combining two input columns

June 9, 2024 - 01:25 AM 5

fe5
Sentiment

June 9, 2024 - 01:27 AM 5

fe6
Phrase Identification in Text

June 9, 2024 - 01:28 AM 5

fe7
Advanced NLQ

June 9, 2024 - 01:30 AM 5
Insight Identification (25 / 40)
Question Date Tested Score Video Recording

ii1
Basic correlation

June 9, 2024 - 01:33 AM 5

ii2
Compare an aggregation for two distinct subsets of data

June 9, 2024 - 01:35 AM 0

ii3
Basic Correlation but requiring aggregation from another table

June 9, 2024 - 01:39 AM 5

ii4
Outlier identification

June 9, 2024 - 01:40 AM 5

ii5
Identifying basic trends on short timelines

June 9, 2024 - 01:41 AM 0

ii6
Understands statistical significance

June 9, 2024 - 01:42 AM 5

ii7
Understands derivitives

June 9, 2024 - 01:44 AM 0

ii8
Can use NLQ feature engineering as part of an insight request

June 9, 2024 - 01:46 AM 5
Learning (0 / 10)
Question Date Tested Score Video Recording

l1
Can remember the meanings of oddly-named columns

June 9, 2024 - 01:47 AM 0

l2
Can remember criteria sets under a single name

June 9, 2024 - 01:49 AM 0
Domain Knowledge (0 / 5)
Question Date Tested Score Video Recording

dk01
Column Relevance Determination

June 9, 2024 - 01:50 AM 0