INDEX
Explanations
references to specific geographical locations or proper nouns
New Auto-Interp
Negative Logits
angan
-0.15
UEL
-0.15
uels
-0.15
ived
-0.14
ounge
-0.14
.Strict
-0.14
uell
-0.14
prioritize
-0.14
ivos
-0.14
quette
-0.13
POSITIVE LOGITS
bing
0.28
bed
0.20
ub
0.20
ilee
0.19
rique
0.19
erculosis
0.18
ric
0.17
berman
0.17
ernal
0.16
leshoot
0.16
Activations Density 0.026%