INDEX
Explanations
first-person pronouns and expressions of personal experience or opinion
New Auto-Interp
Negative Logits
ibold
-0.15
.tc
-0.15
Jensen
-0.14
621
-0.14
utan
-0.14
Paper
-0.14
ramento
-0.14
à¸Ĭม
-0.14
ichel
-0.13
jom
-0.13
POSITIVE LOGITS
may
0.22
may
0.22
?url
0.16
May
0.16
squ
0.16
could
0.15
ever
0.15
encount
0.15
tek
0.15
_MAY
0.15
Activations Density 0.028%