INDEX
Explanations
instances of the pronoun "I" and related personal statements or reflections
New Auto-Interp
Negative Logits
ibold
-0.16
utan
-0.15
apon
-0.15
exo
-0.15
Paper
-0.15
Ye
-0.14
Elapsed
-0.13
Remarks
-0.13
AGON
-0.13
Clay
-0.13
POSITIVE LOGITS
ever
0.20
memory
0.19
hadn
0.19
EVER
0.17
correctly
0.17
memory
0.16
entend
0.16
plx
0.15
were
0.15
ëĿ¼ëıĦ
0.15
Activations Density 0.033%