INDEX
Explanations
phrases related to existential or introspective questions about self-awareness and identity
New Auto-Interp
Negative Logits
preced
-0.59
alcanzó
-0.55
démocr
-0.52
sufficient
-0.50
énergé
-0.49
Lors
-0.48
Datuak
-0.48
AFFIRMED
-0.48
survives
-0.48
Sump
-0.47
POSITIVE LOGITS
doing
0.88
looking
0.84
making
0.83
trying
0.83
)
0.83
tvguidetime
0.83
正在
0.82
going
0.81
đang
0.78
Estou
0.78
Activations Density 0.374%