INDEX
Explanations
the pronoun "I" and its variations in different contexts
New Auto-Interp
Negative Logits
Eſ
-0.96
Efq
-0.88
Perſ
-0.88
Reſ
-0.88
Monfieur
-0.85
itſelf
-0.82
Theſe
-0.80
ſtre
-0.78
Houſe
-0.78
houſe
-0.77
POSITIVE LOGITS
I
1.87
I
1.24
we
1.18
We
1.18
he
1.10
my
1.09
myself
1.02
我
1.00
You
0.99
i
0.96
Activations Density 0.259%