INDEX
Explanations
mentions of the word "me" indicating self-reference or personal perspective
New Auto-Interp
Negative Logits
Monfieur
-1.50
Efq
-1.36
houſe
-1.28
Houſe
-1.28
'\\;'
-1.27
itſelf
-1.26
themſelves
-1.25
ſeveral
-1.25
Theſe
-1.25
purpoſe
-1.19
POSITIVE LOGITS
Me
1.44
me
1.35
I
1.24
Me
1.22
ME
1.12
me
0.99
my
0.98
Myself
0.94
I
0.92
My
0.88
Activations Density 0.040%