INDEX
Explanations
math problems, equations
structural and discourse cues of the model’s step-by-step math solution (e.g., response headers, newlines/section breaks, and procedural lead-ins indicating the start of an explanation).
New Auto-Interp
Negative Logits
es
0.50
hes
0.46
s
0.46
iever
0.46
Therefore
0.46
rain
0.46
ations
0.43
al
0.43
herd
0.43
en
0.42
POSITIVE LOGITS
учиты
0.45
Inspect
0.45
සෑම
0.45
ष्
0.45
鐏
0.44
gewoon
0.44
Inspect
0.43
ט
0.43
آبی
0.43
酤
0.43
Activations Density 0.051%