INDEX
Explanations
phrases related to feelings of happiness and satisfaction
expressions of joy and delight
New Auto-Interp
Negative Logits
eworld
-0.78
alcohol
-0.76
organic
-0.70
vernment
-0.67
ifted
-0.67
downed
-0.65
helle
-0.63
xia
-0.63
Rhod
-0.62
otypes
-0.62
POSITIVE LOGITS
delight
1.24
fully
1.12
pleasure
0.95
urous
0.93
iously
0.91
aston
0.90
ously
0.89
fulness
0.86
ishly
0.85
theless
0.84
Activations Density 0.006%