INDEX
Explanations
characteristic
This neuron detects occurrences of the adjective “characteristic.”
New Auto-Interp
Negative Logits
directly
-0.07
�
-0.06
upper
-0.06
:↵↵↵↵
-0.06
έας
-0.06
NL
-0.06
鉄
-0.06
neler
-0.06
tremendously
-0.06
duplicates
-0.06
POSITIVE LOGITS
distinctive
0.11
trademark
0.10
characteristic
0.08
hallmark
0.07
ASF
0.07
लक
0.07
autobi
0.06
gracious
0.06
افة
0.06
charismatic
0.06
Activations Density 0.015%