INDEX
Explanations
This neuron activates on words used in explicit definitional or explanatory clauses—particularly the relative pronoun “which” and the copula “is.”
New Auto-Interp
Negative Logits
Kok
-0.06
Fast
-0.06
<r
-0.06
Monument
-0.06
QRect
-0.06
Routing
-0.06
OSC
-0.06
Title
-0.05
pickle
-0.05
metric
-0.05
POSITIVE LOGITS
etes
0.08
فایل
0.07
passer
0.07
$")↵
0.07
бан
0.07
breathable
0.07
弹
0.06
_lua
0.06
Whether
0.06
การ
0.06
Activations Density 0.109%