INDEX
Explanations
phrases expressing approval or positive sentiment
positive expressions of achievement and well-being
New Auto-Interp
Negative Logits
urst
-0.77
umed
-0.75
¶ħ
-0.74
ume
-0.74
uming
-0.73
uria
-0.72
olina
-0.71
ipers
-0.71
agram
-0.71
cessive
-0.69
POSITIVE LOGITS
comrade
0.79
tid
0.72
reen
0.72
clus
0.70
folks
0.70
inconvenience
0.69
somebody
0.65
avoids
0.65
noon
0.64
sunshine
0.64
Activations Density 0.293%