INDEX
Explanations
phrases related to existential questions about self and consciousness
New Auto-Interp
Negative Logits
quote
-0.72
FANT
-0.66
plex
-0.62
jl
-0.62
Anyway
-0.62
ibe
-0.61
Firstly
-0.61
Gil
-0.60
Annotations
-0.60
sites
-0.60
POSITIVE LOGITS
likewise
0.84
maxwell
0.68
aeus
0.67
inferior
0.66
subtract
0.66
opposite
0.64
withdraw
0.64
similarly
0.63
nominate
0.62
nonexistent
0.61
Activations Density 0.281%