chatgpt — Run #157 (2024-12-12)
using 4o model
Overall Score:
54.4%
(40% Setup Score + 60% Test Score)
(40% Setup Score + 60% Test Score)
DA-bench Setup for Run #157 — Setup Score: 57.1% (4 / 7)
- Setup Less Than 20 Minutes
- Connects to Data Warehouse
- Handles 1TB Table
- No Individual Upload of Files
- No Python for Setup
- No SQL for Setup
DA-bench Results for Run #157 — Test Score: 52.6% (113 / 215)
Data Querying (78 / 120)
19 Correct Answers,
3 Hallucinations
Domain Knowledge (5 / 5)
1 Correct Answer,
0 Hallucinations
Question | Date Tested | Overall Score | Video Recording |
---|---|---|---|
dk01
|
2024-12-12 | 5 |
Feature Engineering (30 / 40)
7 Correct Answers,
1 Hallucination
Question | Date Tested | Overall Score | Video Recording |
---|---|---|---|
fe1
|
2024-12-13 | 5 | |
fe2
|
2024-12-13 | 5 | |
fe3
|
2024-12-13 | 5 | |
fe4
|
2024-12-13 | 5 | |
fe5
|
2024-12-13 | -5 | |
fe6
|
2024-12-13 | 5 | |
fe7
|
2024-12-13 | 5 | |
fe8
|
2024-12-13 | 5 |
Insight Identification (15 / 25)
4 Correct Answers,
1 Hallucination
Question | Date Tested | Overall Score | Video Recording |
---|---|---|---|
ii2
|
2024-12-16 | 5 | |
ii5
|
2024-12-13 | 5 | |
ii6
|
2024-12-16 | 5 | |
ii7
|
2024-12-13 | -5 | |
ii8
|
2024-12-16 | 5 |
Learning (-10 / 10)
0 Correct Answers,
2 Hallucinations
Question | Date Tested | Overall Score | Video Recording |
---|---|---|---|
l1
|
2024-12-13 | -5 | |
l2
|
2024-12-13 | -5 |
Visualization (-5 / 15)
1 Correct Answer,
2 Hallucinations
Question | Date Tested | Overall Score | Video Recording |
---|---|---|---|
v1
|
2024-12-13 | 5 | |
v2
|
2024-12-13 | -5 | |
v3
|
2024-12-13 | -5 |