INDEX
Explanations
expressions related to emotional experiences and responses
New Auto-Interp
Negative Logits
avou
-0.16
acket
-0.15
ewire
-0.15
ffen
-0.15
stp
-0.14
GBK
-0.14
aticon
-0.14
olare
-0.14
ãĥ³ãĤ¸
-0.14
aler
-0.13
POSITIVE LOGITS
ham
0.16
420
0.16
felt
0.15
aigned
0.14
urn
0.14
aneous
0.14
627
0.14
Ñıж
0.14
693
0.14
661
0.14
Activations Density 0.042%