INDEX
Explanations
phrases related to choices or options
references to different options or choices
New Auto-Interp
Negative Logits
urst
-0.75
soever
-0.75
hemat
-0.74
mind
-0.73
grave
-0.71
ritic
-0.66
tub
-0.64
rollers
-0.64
bey
-0.63
ric
-0.63
POSITIVE LOGITS
options
1.06
atives
0.85
option
0.84
finder
0.84
choices
0.84
Option
0.76
Options
0.75
Option
0.75
Altern
0.74
izons
0.73
Activations Density 0.034%