INDEX
Explanations
proper nouns, particularly personal names
New Auto-Interp
Negative Logits
iou
-0.19
uis
-0.17
itet
-0.17
i
-0.17
itore
-0.17
ozy
-0.17
eson
-0.17
e
-0.16
idis
-0.16
ÄĻd
-0.16
POSITIVE LOGITS
apest
0.18
acity
0.15
lac
0.15
imentary
0.15
icrous
0.14
aim
0.14
رÙĬÙĥ
0.14
icial
0.14
verter
0.14
opia
0.14
Activations Density 0.027%