INDEX
Explanations
references to self-identification or self-references in the text
references to self-identification
New Auto-Interp
Negative Logits
pour
-0.76
onga
-0.75
ibaba
-0.71
icion
-0.70
rought
-0.68
edia
-0.68
iard
-0.67
iens
-0.66
oos
-0.66
heny
-0.63
POSITIVE LOGITS
selves
1.06
worshipped
0.81
selves
0.77
æ³
0.73
self
0.70
è£ıè
0.69
ç¥ŀ
0.68
creatively
0.68
adherent
0.67
acknow
0.67
Activations Density 0.035%