INDEX
Explanations
first-person and collective pronouns indicating personal involvement or experiences
New Auto-Interp
Negative Logits
íĴĪ
-0.16
ä¸ļ
-0.16
Insensitive
-0.16
zag
-0.14
аÑĶ
-0.14
ochen
-0.14
awa
-0.14
ŀ
-0.14
uvw
-0.14
275
-0.14
POSITIVE LOGITS
'd
0.54
’d
0.49
ll
0.32
'll
0.32
ll
0.31
d
0.28
.ll
0.27
’ll
0.27
"d
0.26
'D
0.25
Activations Density 0.330%