INDEX
Explanations
phrases indicating a contrast or alternative
expressions that contrast or clarify ideas, often using the word "rather."
New Auto-Interp
Negative Logits
uay
-0.91
amba
-0.84
adium
-0.80
Yard
-0.75
ocaust
-0.74
aido
-0.73
iens
-0.71
arent
-0.71
ilty
-0.69
rival
-0.67
POSITIVE LOGITS
than
0.77
interestingly
0.67
informative
0.66
differentiate
0.66
FTWARE
0.65
amusing
0.64
distinguish
0.63
conservatism
0.62
akin
0.61
tame
0.60
Activations Density 0.014%