INDEX
Explanations
first-person pronouns and expressions of personal opinions or experiences
New Auto-Interp
Negative Logits
ÙĨب
-0.14
azzo
-0.14
inson
-0.14
ût
-0.14
让æĪij
-0.14
ochen
-0.14
uter
-0.13
کت
-0.13
anne
-0.13
eks
-0.13
POSITIVE LOGITS
wonder
0.22
Agree
0.19
agree
0.17
Wonder
0.16
ilik
0.16
427
0.15
agree
0.15
wondered
0.15
meant
0.15
arda
0.14
Activations Density 0.195%