INDEX
Explanations
phrases related to making comparisons
instances of the word "compare" and its variations
New Auto-Interp
Negative Logits
ktop
-0.81
der
-0.76
vous
-0.74
oÄŁ
-0.72
UX
-0.68
gren
-0.67
liner
-0.67
ding
-0.66
Shift
-0.65
hoff
-0.65
POSITIVE LOGITS
favorably
0.95
apples
0.89
isons
0.87
comparisons
0.83
sexes
0.79
Compare
0.78
uple
0.76
compare
0.71
onga
0.71
objectively
0.70
Activations Density 0.017%