INDEX
Explanations
negative outcomes
The neuron activates on words that signal risk, uncertainty, or potential negative outcomes (e.g. jeopardy, forced, may, risk).
New Auto-Interp
Negative Logits
/mat
-0.07
-pills
-0.07
YE
-0.06
或者
-0.06
menn
-0.06
基
-0.06
tty
-0.06
стандарт
-0.06
很多
-0.06
_Variable
-0.06
POSITIVE LOGITS
drifted
0.07
xBD
0.07
disb
0.06
cntl
0.06
_REMOVE
0.06
�
0.06
=obj
0.06
oppress
0.06
cripcion
0.06
ジ
0.06
Activations Density 0.044%