INDEX
Explanations
The neuron detects instances of the phrase “driven by,” flagging texts that introduce a motivating cause or agent.
New Auto-Interp
Negative Logits
std
-0.08
состав
-0.08
_for
-0.07
(fout
-0.07
ِه
-0.07
slag
-0.07
roken
-0.07
logout
-0.07
urança
-0.07
στον
-0.07
POSITIVE LOGITS
These
0.07
Forge
0.07
governed
0.06
Certain
0.06
BB
0.06
piration
0.06
روشن
0.06
mystical
0.06
-looking
0.06
Bright
0.06
Activations Density 0.020%