INDEX
Explanations
phrases or sentences indicating unfortunate situations or negative consequences
expressions of regret or disappointment
New Auto-Interp
Negative Logits
tein
-0.77
ingham
-0.74
zag
-0.71
arnaev
-0.71
afort
-0.70
arij
-0.70
kindred
-0.69
icle
-0.67
ipers
-0.67
appro
-0.67
POSITIVE LOGITS
adolesc
0.71
nces
0.68
Delicious
0.68
imaru
0.68
ÃĽ
0.64
é¾į
0.64
ESA
0.63
reproduce
0.63
unfortunate
0.62
Tos
0.62
Activations Density 0.016%