INDEX
Explanations
positive attributes
This neuron activates on longer, multisyllabic content words—especially formal or technical nouns and adjectives (e.g. words with Latin-type prefixes or roots).
New Auto-Interp
Negative Logits
,var
-0.07
亮
-0.06
,tr
-0.06
لفة
-0.06
imports
-0.06
�
-0.06
для
-0.06
licking
-0.06
holder
-0.06
株
-0.06
POSITIVE LOGITS
showroom
0.07
/****************************************
0.06
AD
0.06
(proxy
0.06
Amt
0.06
paragraph
0.06
label
0.06
()?.
0.06
erst
0.06
.ManyToManyField
0.06
Activations Density 0.301%