INDEX
Explanations
instances of happiness or positive emotional expressions
New Auto-Interp
Negative Logits
SAN
-0.15
ching
-0.15
å½
-0.14
layer
-0.14
hea
-0.14
age
-0.14
dra
-0.13
prop
-0.13
949
-0.13
aż
-0.13
POSITIVE LOGITS
Dip
0.17
तम
0.16
ione
0.15
itech
0.15
frica
0.15
ιά
0.15
Ñĩина
0.15
isd
0.14
¶Į
0.14
apt
0.14
Activations Density 0.028%