INDEX
Explanations
things related to choices or options presented in a list
mentions of options or choices
New Auto-Interp
Negative Logits
grave
-0.75
soever
-0.73
mind
-0.70
hemat
-0.67
urst
-0.67
ritic
-0.66
encers
-0.63
anti
-0.63
friend
-0.60
orks
-0.59
POSITIVE LOGITS
options
1.10
option
0.91
choices
0.86
finder
0.82
atives
0.81
Options
0.79
nels
0.77
Option
0.76
izons
0.75
Option
0.73
Activations Density 0.037%