INDEX
Explanations
sentences involving decision-making and recommendations
New Auto-Interp
Negative Logits
intl
-0.15
qus
-0.15
ázi
-0.14
imbus
-0.14
iolet
-0.14
afx
-0.14
USIC
-0.14
гов
-0.14
blr
-0.14
YRO
-0.14
POSITIVE LOGITS
opt
0.54
opt
0.47
choose
0.44
opted
0.43
chose
0.42
opts
0.40
opting
0.39
Opt
0.38
choose
0.38
chooses
0.37
Activations Density 0.275%