INDEX
Explanations
positive emotional words related to happiness and joy
words related to happiness and cheerful descriptors
New Auto-Interp
Negative Logits
elig
-0.86
ADRA
-0.81
avis
-0.81
itu
-0.79
tein
-0.73
rav
-0.70
ainer
-0.70
ires
-0.69
detrim
-0.68
rab
-0.68
POSITIVE LOGITS
ppy
1.45
zzy
0.85
weed
0.79
fuzz
0.71
pants
0.70
seas
0.69
opy
0.69
yy
0.69
glers
0.68
terness
0.67
Activations Density 0.009%