INDEX
Explanations
conflict, war, espionage
The neuron detects cautionary or warning language signaling avoidance of detection or attention (e.g. instructions to be “careful,” “avoid,” or not attract notice).
New Auto-Interp
Negative Logits
됨
-0.06
).[
-0.06
повідом
-0.06
cinemas
-0.06
陵
-0.06
Traditional
-0.06
.[
-0.06
zoo
-0.06
لن
-0.06
sordu
-0.06
POSITIVE LOGITS
.parser
0.07
38
0.07
Serialization
0.06
.accuracy
0.06
triang
0.06
выбор
0.06
inoc
0.06
니아
0.06
Dispatcher
0.06
seedu
0.06
Activations Density 0.018%