Skip to content

Commit a12c7f8

Browse files
committed
Update results
1 parent 0e5a1ea commit a12c7f8

File tree

2 files changed

+117
-9
lines changed

2 files changed

+117
-9
lines changed

index.html

+11-9
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
4040
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
4141
</div>
4242
<div class="header_subtitle">
43-
<p>Tests are run every day at 1am PT. Last updated January 26, 2025.</p>
43+
<p>Tests are run every day at 1am PT. Last updated January 27, 2025.</p>
4444
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
4545
</div>
4646
<div class="header_cta">
@@ -58,12 +58,12 @@ <h1>How's GPT-4o Doing?</h1>
5858
<div class="feature_header" style="min-height: auto">
5959
<div class="feature_header_text" style="gap: var(--spacing-sizing-4)">
6060
<h2>Response Time</h2>
61-
<p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>3.74 seconds</b> per request.</p>
61+
<p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>3.73 seconds</b> per request.</p>
6262
<p class="subtitle">This number only accounts for requests made by this application.</p>
6363
</div>
6464
<div class="chart">
6565
<div class="chart_box chart_box_green">
66-
<p>3.74 s</p>
66+
<p>3.73 s</p>
6767
</div>
6868
</div>
6969
</div>
@@ -122,7 +122,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
122122
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
123123
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
124124
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
125-
<pre>7</pre>
125+
<pre>8</pre>
126126
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
127127
</div>
128128
</div>
@@ -230,7 +230,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
230230
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
231231
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
232232
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
233-
<pre>{'x': 0.4, 'y': 0.35, 'width': 0.3, 'height': 0.25}</pre>
233+
<pre>{'x': 0.42, 'y': 0.35, 'width': 0.18, 'height': 0.28}</pre>
234234
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
235235
</div>
236236
</div>
@@ -361,7 +361,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
361361
{
362362
"R": 82,
363363
"G": 0,
364-
"B": 128
364+
"B": 138
365365
}
366366
```</pre>
367367
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
@@ -403,7 +403,7 @@ <h2>Annotation Quality Assurance</h2>
403403
</div>
404404
</div>
405405
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
406-
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.017</p>
406+
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.018</p>
407407
</div>
408408
<div class="explainer_dropdown">
409409
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
@@ -417,10 +417,12 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
417417
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
418418
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
419419
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
420-
<pre>It appears that the dataset captures cars on the road with bounding boxes (red boxes). The image shows several cars labeled correctly, but there is at least one car (the white car on the right) that seems unlabeled.
420+
<pre>To count the missing annotations, I would need to know the total number of cars visible in the image versus the number of cars with red bounding boxes. Based on the image:
421421

422-
Here's the result in JSON format:
422+
1. **Cars annotated with bounding boxes:** There are 6 red bounding boxes visible.
423+
2. **Cars visible in the scene, including unannotated ones:** It appears there is one car near the farthest end of the scene without a bounding box on it.
423424

425+
### JSON Output:
424426
```json
425427
{
426428
"missing": 1

results/2025-01-27.json

+106
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
{
2+
"zero_shot_classification": {
3+
"score": 1,
4+
"success": true,
5+
"price": 0.006400000000000001,
6+
"pass_fail": "Pass",
7+
"response_time": 1.9188237190246582,
8+
"result": "Toyota Camry"
9+
},
10+
"count_fruit": {
11+
"score": 0,
12+
"success": false,
13+
"price": 0.00882,
14+
"pass_fail": "Fail",
15+
"response_time": 1.8202457427978516,
16+
"result": "8"
17+
},
18+
"document_ocr": {
19+
"score": 0,
20+
"success": false,
21+
"price": 0.00988,
22+
"pass_fail": "Fail",
23+
"response_time": 2.523519992828369,
24+
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the *Midnights* album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
25+
},
26+
"handwriting_ocr": {
27+
"score": 1,
28+
"success": true,
29+
"price": 0.00974,
30+
"pass_fail": "Pass",
31+
"response_time": 8.458041667938232,
32+
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
33+
},
34+
"extraction_ocr": {
35+
"score": 1.0,
36+
"success": true,
37+
"price": 0.00876,
38+
"pass_fail": "Pass",
39+
"response_time": 2.442870616912842,
40+
"result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
41+
},
42+
"math_ocr": {
43+
"score": 1.0,
44+
"success": true,
45+
"price": 0.015070000000000002,
46+
"pass_fail": "Pass",
47+
"response_time": 3.94942045211792,
48+
"result": "3x^2-6x+2"
49+
},
50+
"object_detection": {
51+
"score": 0.3245492371705965,
52+
"success": false,
53+
"price": 0.01044,
54+
"pass_fail": "Fail",
55+
"response_time": 2.722054958343506,
56+
"result": "{'x': 0.42, 'y': 0.35, 'width': 0.18, 'height': 0.28}"
57+
},
58+
"graph_understanding": {
59+
"score": 0.99,
60+
"success": false,
61+
"price": 0.01174,
62+
"pass_fail": "Fail",
63+
"response_time": 2.396726131439209,
64+
"result": "```json\n{\n \"A\": {\n \"quantity\": 20,\n \"price\": 10\n },\n \"B\": {\n \"quantity\": 25,\n \"price\": 20\n },\n \"C\": {\n \"quantity\": 30,\n \"price\": 30\n },\n \"D\": {\n \"quantity\": 35,\n \"price\": 40\n }\n}\n```"
65+
},
66+
"color_recognition": {
67+
"score": 0.9673202614379085,
68+
"success": false,
69+
"price": 0.009850000000000001,
70+
"pass_fail": "Fail",
71+
"response_time": 2.0315816402435303,
72+
"result": "```json\n{\n \"R\": 82,\n \"G\": 0,\n \"B\": 138\n}\n```"
73+
},
74+
"annotation_qa": {
75+
"score": 0.33333333333333337,
76+
"success": false,
77+
"price": 0.01787,
78+
"pass_fail": "Fail",
79+
"response_time": 3.923919439315796,
80+
"result": "To count the missing annotations, I would need to know the total number of cars visible in the image versus the number of cars with red bounding boxes. Based on the image:\n\n1. **Cars annotated with bounding boxes:** There are 6 red bounding boxes visible.\n2. **Cars visible in the scene, including unannotated ones:** It appears there is one car near the farthest end of the scene without a bounding box on it.\n\n### JSON Output:\n```json\n{\n \"missing\": 1\n}\n```"
81+
},
82+
"measurement": {
83+
"score": 0.8571428571428572,
84+
"success": false,
85+
"price": 0.009720000000000001,
86+
"pass_fail": "Fail",
87+
"response_time": 3.5829694271087646,
88+
"result": "```json\n{\n \"length\": 3.0,\n \"width\": 3.0\n}\n```"
89+
},
90+
"easy_captcha": {
91+
"score": 1,
92+
"success": true,
93+
"price": 0.00636,
94+
"pass_fail": "Pass",
95+
"response_time": 1.9920110702514648,
96+
"result": "charybdis indubitable"
97+
},
98+
"easy_captcha_persuade": {
99+
"score": 1,
100+
"success": true,
101+
"price": 0.006860000000000001,
102+
"pass_fail": "Pass",
103+
"response_time": 1.6189548969268799,
104+
"result": "charybdis indubitable"
105+
}
106+
}

0 commit comments

Comments
 (0)