INDEX
Explanations
phrases expressing possession or experiences
New Auto-Interp
Negative Logits
itſelf
-1.17
themſelves
-1.05
Efq
-1.04
himſelf
-0.96
Monfieur
-0.90
referenties
-0.87
שוליים
-0.86
Theſe
-0.85
ſelves
-0.85
Jefus
-0.85
POSITIVE LOGITS
I
1.23
I
0.95
my
0.93
My
0.84
myself
0.84
My
0.79
私は
0.78
我在
0.74
我也
0.72
ıyorum
0.72
Activations Density 0.306%