INDEX
Explanations
phrases expressing wishes or gratitude
expressions of desire or preference
New Auto-Interp
Negative Logits
ingen
-0.75
onut
-0.71
VERTISEMENT
-0.71
angular
-0.69
livious
-0.69
infeld
-0.68
cit
-0.66
lass
-0.66
illusion
-0.66
may
-0.65
POSITIVE LOGITS
revenge
0.81
lihood
0.77
lier
0.75
liest
0.71
forgiveness
0.70
retribution
0.70
redress
0.69
predec
0.68
vengeance
0.64
warr
0.64
Activations Density 0.034%