INDEX
Explanations
references to specific individuals, particularly names starting with the letter 'D'
New Auto-Interp
Negative Logits
æ¡IJ
-0.18
avis
-0.17
ÑĢÑĥг
-0.16
iaz
-0.16
ouble
-0.15
cour
-0.15
uty
-0.15
£i
-0.15
emons
-0.14
گار
-0.14
POSITIVE LOGITS
istrovstvÃŃ
0.19
opal
0.17
antan
0.16
nou
0.15
anel
0.15
allon
0.15
SB
0.15
(strict
0.14
ktop
0.14
blond
0.14
Activations Density 0.046%