INDEX
Explanations
instances related to self-reflection, personal experience, and self-identity
New Auto-Interp
Negative Logits
XIII
-0.71
heny
-0.68
cheon
-0.68
SHIP
-0.68
rice
-0.68
ī
-0.67
ondo
-0.66
Ashe
-0.66
endar
-0.64
yk
-0.63
POSITIVE LOGITS
destruct
1.08
conscious
0.88
same
0.87
lessly
0.85
destruct
0.82
proclaimed
0.82
explanatory
0.81
esteem
0.79
conscious
0.76
pres
0.75
Activations Density 5.191%