INDEX
Explanations
words related to negative emotions and consequences
suffixes indicating negative sentiments or conditions
New Auto-Interp
Negative Logits
inav
-0.78
herty
-0.76
sav
-0.72
ergy
-0.71
oret
-0.71
ITNESS
-0.68
vez
-0.67
ENN
-0.67
gres
-0.66
ellar
-0.65
POSITIVE LOGITS
ously
0.80
inflicted
0.75
thereof
0.72
Bastard
0.70
plag
0.70
perpetrated
0.70
quo
0.69
uous
0.69
OUS
0.69
naires
0.66
Activations Density 0.270%