You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for use_answer_as_expected_output and use_answer_as_test_code in evaluation function
Enhances `io_test` and `unit_test` modes by allowing the answer field to serve as the reference solution or test code. Updates evaluation logic, adds unit tests, and refines documentation across `CLAUDE.md`, `README.md`, `user.md`, and `dev.md` to detail usage and advantages.
Copy file name to clipboardExpand all lines: README.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,6 +31,8 @@ The function supports three modes, set via `params.mode`.
31
31
}
32
32
```
33
33
34
+
Bare expressions (e.g. `5 * 5`) print automatically without `print()`, like a Jupyter notebook cell.
35
+
34
36
**`io_test`** — compare stdout against expected output for each test case:
35
37
36
38
```json
@@ -46,6 +48,8 @@ The function supports three modes, set via `params.mode`.
46
48
}
47
49
```
48
50
51
+
Set `"use_answer_as_expected_output": true` to run the `answer` (reference solution) against each test's input instead of hardcoding `expected_output`. Variable injection via `inject` is also supported as an alternative to stdin.
52
+
49
53
**`unit_test`** — run student code then execute `test_*` functions or `unittest.TestCase` subclasses (including Hypothesis tests):
50
54
51
55
```json
@@ -58,6 +62,8 @@ The function supports three modes, set via `params.mode`.
58
62
}
59
63
```
60
64
65
+
Set `"use_answer_as_test_code": true` to read test code from the `answer` field instead of `params.test_code` — useful in the LF UI where the answer field is a proper code editor.
Copy file name to clipboardExpand all lines: docs/dev.md
+33-2Lines changed: 33 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@
16
16
```json
17
17
{
18
18
"response": "<student code string>",
19
-
"answer": "<unused — may be null>",
19
+
"answer": "<reference solution — used when use_answer_as_test_code or use_answer_as_expected_output is set>",
20
20
"params": { ... }
21
21
}
22
22
```
@@ -35,6 +35,8 @@ Run student code with no stdin and return its stdout as output feedback. No pass
35
35
}
36
36
```
37
37
38
+
If the last statement is a bare expression (e.g. `3.14 * 2 * 5`), it is automatically wrapped in `print(repr(...))` so it prints like a REPL. Existing `print()` calls are not double-wrapped.
39
+
38
40
Feedback tags produced: `output` (stdout + any plots), or `error` (timeout / runtime error).
39
41
40
42
---
@@ -67,14 +69,32 @@ Each test case uses either `input` (stdin-based) or `inject` (variable injection
67
69
|-------|-------------|
68
70
|`input`| Text piped to stdin. Mutually exclusive with `inject`. |
69
71
|`inject`| Dict of `{variable_name: value}` prepended as assignments before student code. Values can be any JSON type. Mutually exclusive with `input`. |
70
-
|`expected_output`| Expected stdout; trailing whitespace stripped before comparison. |
72
+
|`expected_output`| Expected stdout; trailing whitespace stripped before comparison. Required unless `use_answer_as_expected_output` is set. |
71
73
|`hidden`|`true` = suppress input/variables and expected output from feedback. |
72
74
73
75
-`tests` is required; an empty list sets `is_correct = true` with `0/0 tests passed`.
74
76
-`hidden: true` replaces details with `"Hidden test N: failed."` so students cannot reverse-engineer the answer.
75
77
- With `inject`, feedback shows a "Variables:" block (e.g. `n = 5`) instead of "Input:".
78
+
- Bare final expressions in student code are auto-wrapped in `print(repr(...))` (REPL behaviour).
76
79
- Matplotlib figures generated during a test are uploaded to S3 and embedded in the feedback.
77
80
81
+
#### `use_answer_as_expected_output`
82
+
83
+
When `true`, the `answer` argument (reference solution code) is executed with the same input/inject as each test, and its stdout is used as the expected output. The `expected_output` field on each test object is ignored.
84
+
85
+
```json
86
+
{
87
+
"mode": "io_test",
88
+
"use_answer_as_expected_output": true,
89
+
"tests": [
90
+
{ "input": "5\n" },
91
+
{ "inject": {"n": 5} }
92
+
]
93
+
}
94
+
```
95
+
96
+
This avoids hardcoding expected outputs in params — useful when the LF UI code editor holds the reference solution.
97
+
78
98
Feedback tags produced per test: `pass`, `fail`, or `hidden_fail`. Global: `summary`, `error` (timeout / runtime error).
79
99
80
100
---
@@ -98,6 +118,17 @@ Append teacher-supplied test code to the student submission, then execute the co
98
118
- Student `print()` calls do not pollute test results (stdout is discarded; results are passed via a temp JSON file).
99
119
-`is_correct` is `true` only when all tests pass and at least one test ran.
100
120
121
+
#### `use_answer_as_test_code`
122
+
123
+
When `true`, the `answer` argument is used as the test code instead of `params["test_code"]`. This is preferred when using the LF UI, whose params field is a plain JSON editor (poor for multiline code) while the answer field is a proper code editor.
124
+
125
+
```json
126
+
{
127
+
"mode": "unit_test",
128
+
"use_answer_as_test_code": true
129
+
}
130
+
```
131
+
101
132
Feedback tags produced per test: `pass`, `fail`. Global: `summary`, `error` (timeout / module-level crash / empty test_code).
Copy file name to clipboardExpand all lines: docs/user.md
+48-1Lines changed: 48 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ Runs the student's code and shows them their output. No pass/fail verdict is giv
21
21
{ "mode": "demo" }
22
22
```
23
23
24
-
Students see their stdout and any matplotlib figures they produced.
24
+
Students see their stdout and any matplotlib figures they produced. Bare expressions (e.g. `3.14 * 2 * 5`) print automatically without needing `print()`, just like a Jupyter notebook.
25
25
26
26
---
27
27
@@ -59,6 +59,32 @@ Each test case uses **either** `input` (student reads via `input()`) **or** `inj
59
59
- You can mix visible and hidden tests in the same question.
60
60
- Matplotlib figures produced during a passing or failing test are shown to the student.
61
61
- A 25-second per-test timeout applies; timed-out tests count as failures.
62
+
- Students can write bare expressions (e.g. `3.14 * r * r`) without `print()` — the output is captured automatically.
63
+
64
+
### Using the answer field as the reference solution
65
+
66
+
If you set `"use_answer_as_expected_output": true`, you can write your reference solution in the **answer** field (the code editor in the LF UI) instead of hardcoding `expected_output` in every test case. The system runs your solution with each test's input and uses its output as the expected result.
67
+
68
+
**Params**
69
+
```json
70
+
{
71
+
"mode": "io_test",
72
+
"use_answer_as_expected_output": true,
73
+
"tests": [
74
+
{ "input": "5\n" },
75
+
{ "input": "0\n" },
76
+
{ "input": "-3\n", "hidden": true }
77
+
]
78
+
}
79
+
```
80
+
81
+
**Answer field** (reference solution):
82
+
```python
83
+
n =int(input())
84
+
print(n * n)
85
+
```
86
+
87
+
This is especially convenient when the reference solution is already in the answer field for the worked solution display — you don't need to duplicate the expected outputs.
- Student `print()` calls do not affect test results.
160
186
- A 25-second total timeout applies to the entire execution.
161
187
188
+
### Writing test code in the answer field
189
+
190
+
The LF params editor is a plain JSON editor, which makes writing multiline test code awkward. Instead, set `"use_answer_as_test_code": true` and write your test functions in the **answer** field (the proper code editor):
0 commit comments