INDEX
Explanations
verbs related to decision-making and preferences
phrases related to preferences or tendencies
New Auto-Interp
Negative Logits
AIDS
-0.68
ankind
-0.66
violates
-0.65
harm
-0.64
awaited
-0.63
soever
-0.62
Govern
-0.62
sylvania
-0.61
lihood
-0.61
itous
-0.61
POSITIVE LOGITS
preferring
0.81
iet
0.77
preferred
0.77
caution
0.72
ãĤ©
0.72
foc
0.70
oof
0.70
Preferred
0.70
prefer
0.70
prioritize
0.70
Activations Density 0.739%