INDEX
Explanations
proper nouns, particularly names and places
New Auto-Interp
Negative Logits
urt
-0.15
icans
-0.15
è¾ij
-0.15
(class
-0.14
Matth
-0.14
ism
-0.14
terior
-0.14
atern
-0.13
ÃŃ
-0.13
urgeon
-0.13
POSITIVE LOGITS
urette
0.16
šov
0.16
intr
0.16
ÑĢоÑĪ
0.15
äge
0.15
shore
0.15
ÑĢож
0.14
aget
0.14
Ñĥв
0.14
APA
0.13
Activations Density 0.033%