INDEX
Explanations
first-person pronouns indicating personal experiences or feelings
New Auto-Interp
Negative Logits
Theſe
-1.08
Beſ
-1.03
Monfieur
-0.90
Efq
-0.84
Padang
-0.83
Reſ
-0.81
ſeveral
-0.81
Eſ
-0.81
themſelves
-0.80
CER
-0.78
POSITIVE LOGITS
I
2.06
I
1.61
i
1.21
We
1.11
we
1.05
We
0.99
Me
0.97
𝗜
0.97
My
0.96
𝑰
0.95
Activations Density 0.276%