INDEX
Explanations
pronouns referring to oneself or others, and verbs indicating a change or current state of being
questions and statements about identity and self-perception
New Auto-Interp
Negative Logits
arya
-0.74
breeze
-0.68
artifacts
-0.66
margin
-0.61
clearance
-0.60
hazard
-0.60
Checking
-0.59
Clever
-0.56
elsius
-0.56
ragon
-0.55
POSITIVE LOGITS
Reloaded
0.70
profess
0.68
uably
0.68
pretended
0.64
rir
0.64
rencies
0.63
Self
0.62
uer
0.61
portrays
0.60
Become
0.59
Activations Density 0.115%