INDEX
Explanations
The neuron activates on the Russian first‐person pronoun “я.”
New Auto-Interp
Negative Logits
ANTE
-0.07
chasing
-0.06
порів
-0.06
коли
-0.06
影
-0.06
мот
-0.06
atre
-0.06
T
-0.06
泛
-0.06
atori
-0.06
POSITIVE LOGITS
cylinders
0.07
..↵
0.07
년
0.07
Democratic
0.07
.install
0.07
bal
0.06
archs
0.06
_voltage
0.06
idlo
0.06
_lifetime
0.06
Activations Density 0.013%