INDEX
Explanations
words related to preferences or choices
New Auto-Interp
Negative Logits
brance
-1.01
gren
-0.70
idem
-0.70
orig
-0.68
breakers
-0.67
arta
-0.66
interrupted
-0.66
infeld
-0.65
bish
-0.64
breaker
-0.63
POSITIVE LOGITS
rals
0.84
embodiments
0.81
anonymity
0.75
ably
0.71
embodiment
0.68
endings
0.65
preferring
0.64
solitude
0.63
simplicity
0.60
alternative
0.60
Activations Density 0.022%