INDEX
Explanations
Informal instructions/text
The neuron activates on language about acquiring or exerting power, control, influence, or dominance.
New Auto-Interp
Negative Logits
의
-0.07
اخبار
-0.06
„V
-0.06
_readable
-0.06
盟
-0.06
O
-0.06
Finch
-0.06
�
-0.06
EY
-0.06
referencia
-0.06
POSITIVE LOGITS
gym
0.06
hawks
0.06
amura
0.06
-bs
0.06
.algorithm
0.06
tenant
0.06
caff
0.06
jazz
0.06
res
0.06
。“
0.06
Activations Density 0.035%