INDEX
Explanations
words that convey positivity or desirable qualities
New Auto-Interp
Negative Logits
afi
-0.16
isel
-0.15
irsch
-0.15
egend
-0.15
uther
-0.15
udder
-0.15
ottenham
-0.15
eoq
-0.15
bish
-0.15
ieron
-0.15
POSITIVE LOGITS
than
0.28
_than
0.21
than
0.19
Than
0.18
THAN
0.18
-than
0.17
est
0.16
Zimmer
0.16
ÙĤت
0.15
312
0.15
Activations Density 0.171%