INDEX
Explanations
proper nouns, particularly names of individuals and places
New Auto-Interp
Negative Logits
iggins
-0.16
oice
-0.15
Abs
-0.14
inand
-0.14
culus
-0.14
UBE
-0.14
clair
-0.14
Zucker
-0.13
»¿
-0.13
335
-0.13
POSITIVE LOGITS
orz
0.18
exus
0.16
oggles
0.16
657
0.16
Rune
0.16
inton
0.15
BirliÄŁi
0.15
fold
0.15
.Dial
0.15
ussen
0.14
Activations Density 0.189%