INDEX
Explanations
words expressing negative emotions or outcomes
expressions of regret or misfortune
New Auto-Interp
Negative Logits
arnaev
-0.78
addons
-0.69
rouse
-0.67
ĸļ
-0.67
rounder
-0.67
aver
-0.66
aeda
-0.64
kefeller
-0.64
afort
-0.64
arij
-0.64
POSITIVE LOGITS
,
0.91
enough
0.90
,...
0.80
for
0.74
though
0.74
neither
0.73
however
0.70
alas
0.69
there
0.68
none
0.67
Activations Density 0.058%