INDEX
Explanations
the pronoun "I" and its variations, indicating a focus on self-reference
New Auto-Interp
Negative Logits
itſelf
-0.86
Theſe
-0.84
ſelves
-0.79
Houſe
-0.78
openConnection
-0.77
SEGUIR
-0.74
tanleria
-0.73
">:
-0.73
Efq
-0.73
themſelves
-0.72
POSITIVE LOGITS
am
1.37
Im
1.07
Im
1.01
Am
1.00
m
0.93
Am
0.93
im
0.92
am
0.90
Iam
0.89
I
0.88
Activations Density 0.051%