INDEX
Explanations
emotional reactions and feelings conveyed in the text
New Auto-Interp
Negative Logits
erus
-0.15
anela
-0.14
/Peak
-0.14
zw
-0.14
iente
-0.14
izedName
-0.14
isz
-0.14
ahat
-0.13
spoiler
-0.13
elon
-0.13
POSITIVE LOGITS
echo
0.15
others
0.15
ãĥ©ãĥĥãĤ¯
0.15
Ñĥг
0.15
us
0.14
achers
0.14
others
0.14
eÄį
0.14
.bootstrap
0.13
utr
0.13
Activations Density 0.170%