INDEX
Explanations
phrases related to making decisions or offering suggestions
New Auto-Interp
Negative Logits
invariably
-0.66
ALWAYS
-0.62
relentlessly
-0.60
uniformly
-0.60
ocracy
-0.59
overwhelmingly
-0.58
always
-0.58
ulner
-0.57
basically
-0.57
constantly
-0.55
POSITIVE LOGITS
someday
1.13
tempted
0.81
inadvertently
0.80
unintentionally
0.74
misunder
0.73
unwittingly
0.73
xus
0.70
ivably
0.68
TAIN
0.67
somew
0.65
Activations Density 0.276%