INDEX
Explanations
research papers
This neuron responds to words that name processes or actions—especially nominalizations like “activities,” “allocation,” “identifying,” “changes,” and similar terms.
New Auto-Interp
Negative Logits
tạp
-0.06
ů
-0.06
(Unknown
-0.06
明白
-0.06
uniqu
-0.06
xml
-0.06
interle
-0.06
Pool
-0.06
561
-0.06
Fraud
-0.06
POSITIVE LOGITS
Οι
0.07
الجديد
0.07
боль
0.06
reira
0.06
Acts
0.06
patiently
0.06
ope
0.06
tri
0.06
insics
0.06
istedi
0.06
Activations Density 0.122%