INDEX
Explanations
first-person pronouns and personal reflections
New Auto-Interp
Negative Logits
Theſe
-1.15
Beſ
-1.05
Monfieur
-0.98
Efq
-0.88
Reſ
-0.87
ſeveral
-0.86
Anſ
-0.84
Eſ
-0.84
Diſ
-0.82
Padang
-0.79
POSITIVE LOGITS
I
1.95
I
1.49
We
1.25
we
1.24
i
1.12
We
1.09
he
0.96
𝑰
0.91
tôi
0.91
He
0.89
Activations Density 0.223%