INDEX
Explanations
phrases that indicate excellence or superiority
New Auto-Interp
Negative Logits
sson
-0.15
uster
-0.15
union
-0.15
stell
-0.14
oust
-0.14
ustom
-0.14
\Component
-0.14
antan
-0.14
atron
-0.14
ayla
-0.14
POSITIVE LOGITS
lia
0.16
rif
0.15
ãģŁãģı
0.14
otp
0.14
iales
0.14
elite
0.14
Nev
0.14
rane
0.14
alia
0.14
rides
0.13
Activations Density 0.023%