INDEX
Explanations
instances of strong emotional expressions or reactions
New Auto-Interp
Negative Logits
assage
-0.16
loe
-0.15
ÄĻk
-0.15
intColor
-0.14
blr
-0.14
atement
-0.14
isses
-0.14
ãĥ¥ãĥ¼
-0.14
fragistics
-0.14
znám
-0.13
POSITIVE LOGITS
there
0.15
Bilg
0.14
aden
0.14
hon
0.14
maybe
0.14
mere
0.12
Carolyn
0.12
yp
0.12
no
0.12
Bloss
0.12
Activations Density 0.124%