INDEX
Explanations
indications of choices or alternatives
the word "either."
New Auto-Interp
Negative Logits
thal
-0.89
achus
-0.79
vironments
-0.72
appings
-0.72
emen
-0.70
acter
-0.69
vironment
-0.68
plates
-0.68
ulations
-0.68
orig
-0.67
POSITIVE LOGITS
side
0.93
way
0.79
consciously
0.78
willfully
0.72
directly
0.68
manually
0.67
starve
0.66
overtly
0.66
intentionally
0.64
omit
0.63
Activations Density 0.019%