INDEX
Explanations
proper nouns, particularly names and places
New Auto-Interp
Negative Logits
tik
-0.17
bedo
-0.15
ulled
-0.15
quia
-0.15
platz
-0.15
keh
-0.14
agli
-0.14
à¸Ĭà¸Ļ
-0.14
raz
-0.14
SYS
-0.14
POSITIVE LOGITS
erson
0.31
ley
0.29
son
0.28
ford
0.27
ston
0.27
lington
0.27
ington
0.26
ison
0.25
ton
0.25
field
0.25
Activations Density 0.228%