INDEX
Explanations
The neuron activates on tokens that indicate making or applying modifications (e.g., “make,” “change,” “patch,” “modify,” “adjustment,” “rotation”).
New Auto-Interp
Negative Logits
Technical
-0.07
PLAYER
-0.07
%%
-0.06
disdain
-0.06
.End
-0.06
UI
-0.06
IsNot
-0.06
API
-0.06
tty
-0.06
/ajax
-0.06
POSITIVE LOGITS
практически
0.07
帶
0.07
cosa
0.07
片
0.06
reaction
0.06
thrill
0.06
ruth
0.06
افت
0.06
flew
0.06
bottle
0.06
Activations Density 0.088%