INDEX
Explanations
instances of the pronoun "we."
instances of the word "we."
New Auto-Interp
Negative Logits
him
-0.76
ragon
-0.62
yna
-0.58
emon
-0.55
incial
-0.54
ensed
-0.53
igun
-0.52
Override
-0.52
ounding
-0.51
him
-0.50
POSITIVE LOGITS
we
2.67
we
1.75
We
1.71
We
1.67
our
1.52
ourselves
1.51
WE
1.40
ours
1.39
Our
1.24
us
1.18
Activations Density 0.170%