INDEX
Explanations
phrases related to self-awareness and personal realization
statements about personal identity and self-awareness
New Auto-Interp
Negative Logits
pedia
-0.63
aughs
-0.62
freezes
-0.61
utions
-0.61
ibaba
-0.61
downs
-0.60
fty
-0.60
ysis
-0.60
defends
-0.59
Scotia
-0.59
POSITIVE LOGITS
indeed
0.94
nt
0.92
somehow
0.77
truly
0.75
actually
0.72
intimately
0.72
irre
0.72
indispensable
0.71
willing
0.68
behold
0.68
Activations Density 0.637%