INDEX
Explanations
instances of the pronoun "we" and variations thereof that indicate collective experiences or actions
New Auto-Interp
Negative Logits
ectors
-0.15
our
-0.14
intention
-0.14
lúc
-0.14
intentions
-0.14
Dav
-0.14
.Doc
-0.14
çIJ
-0.14
iffe
-0.13
ô
-0.13
POSITIVE LOGITS
شاÙĩد
0.17
hear
0.16
Ori
0.15
seeing
0.15
loser
0.15
awei
0.15
Integral
0.14
expecting
0.14
ibold
0.14
ÅĻeh
0.14
Activations Density 0.067%