INDEX
Explanations
instances of comparisons or contrasts in the text
New Auto-Interp
Negative Logits
Mountain
-0.15
ads
-0.15
imitives
-0.14
ú
-0.14
anki
-0.14
gi
-0.14
Freder
-0.13
Wings
-0.13
936
-0.13
ama
-0.13
POSITIVE LOGITS
Helmet
0.15
ê»ĺ
0.15
nodoc
0.14
beck
0.14
客
0.14
tron
0.13
ullet
0.13
AVR
0.13
Sac
0.13
cf
0.13
Activations Density 0.110%