INDEX
Explanations
references to the self or personal identity
New Auto-Interp
Negative Logits
Monfieur
-1.46
Efq
-1.31
Theſe
-1.24
ſeveral
-1.22
Houſe
-1.19
itſelf
-1.16
houſe
-1.15
'\\;'
-1.14
purpoſe
-1.13
themſelves
-1.10
POSITIVE LOGITS
Me
1.58
me
1.49
Me
1.37
ME
1.23
I
1.17
me
1.05
ME
1.00
I
0.88
मे
0.87
Meier
0.86
Activations Density 0.042%