INDEX
Explanations
the presence of personal pronouns indicating self-reference in various contexts
New Auto-Interp
Negative Logits
forChild
-0.52
prostu
-0.50
pinulongan
-0.50
للمعارف
-0.47
uose
-0.46
máscara
-0.46
Yourself
-0.46
itſelf
-0.46
니까
-0.46
יוחד
-0.46
POSITIVE LOGITS
hadn
1.01
ever
0.94
weren
0.93
anything
0.91
anything
0.86
Anything
0.74
Anything
0.72
ANYTHING
0.70
weren
0.68
EVER
0.68
Activations Density 0.122%