INDEX
Explanations
mentions of specific names or proper nouns
New Auto-Interp
Negative Logits
imum
-0.17
uve
-0.17
ibar
-0.17
ultz
-0.16
ubat
-0.16
ainer
-0.15
inesis
-0.15
ember
-0.15
rame
-0.15
uji
-0.15
POSITIVE LOGITS
ann
0.19
ond
0.18
ml
0.18
opoulos
0.15
ual
0.15
ichen
0.15
UAL
0.15
OMETRY
0.15
olini
0.15
gag
0.14
Activations Density 0.060%