INDEX
Explanations
signature
This neuron activates on words that signal a distinctive or hallmark style—e.g. “signature,” “trademark,” or “characteristic.”
New Auto-Interp
Negative Logits
departments
-0.07
Αυ
-0.06
atical
-0.06
いか
-0.06
utilities
-0.06
.newLine
-0.06
Checked
-0.06
_related
-0.06
confiscated
-0.06
aides
-0.06
POSITIVE LOGITS
BL
0.06
topics
0.06
선수
0.06
.basicConfig
0.06
BLOCK
0.06
.beginPath
0.06
deeply
0.06
العم
0.05
jot
0.05
GLES
0.05
Activations Density 0.023%