INDEX
Explanations
incorrect, false, wrong
The neuron detects tokens that signal a correction or negation of a preceding statement (e.g. words like “incorrect,” “isn’t,” “false,” “wrong,” “quite right,” etc.).
New Auto-Interp
Negative Logits
ंब
-0.06
outfit
-0.06
Если
-0.06
IntoConstraints
-0.06
Utf
-0.06
امبر
-0.06
ソ
-0.06
ฺ
-0.05
Sup
-0.05
aklı
-0.05
POSITIVE LOGITS
induces
0.08
inexperienced
0.07
_qs
0.07
(int
0.07
Shea
0.07
aster
0.07
Url
0.07
(CH
0.06
HAL
0.06
переж
0.06
Activations Density 0.044%