INDEX
Explanations
adverbs that express a preference or choice
instances of the word "rather"
New Auto-Interp
Negative Logits
DD
-0.77
amba
-0.74
anon
-0.72
ORN
-0.72
haw
-0.69
ppo
-0.67
onga
-0.66
mberg
-0.66
ongo
-0.65
ipl
-0.64
POSITIVE LOGITS
rather
0.82
Ide
0.75
unpop
0.74
rather
0.72
bilt
0.71
itably
0.70
unimagin
0.69
metic
0.67
Neh
0.66
than
0.66
Activations Density 0.015%