INDEX
Explanations
phrases indicating contrast and comparison
phrases indicating a comparison or contrast
New Auto-Interp
Negative Logits
uble
-0.74
enos
-0.72
amon
-0.67
atile
-0.66
erest
-0.65
Clicker
-0.63
olis
-0.62
esome
-0.62
acha
-0.61
nce
-0.60
POSITIVE LOGITS
preferably
0.83
etheless
0.77
excluding
0.75
cause
0.75
coerc
0.72
evidenced
0.72
preferring
0.71
ardless
0.69
including
0.68
theirs
0.68
Activations Density 0.290%