INDEX
Explanations
phrases expressing contrast or contradiction
phrases that introduce contrasting ideas or elaborations
New Auto-Interp
Negative Logits
olars
-0.87
irm
-0.70
uay
-0.68
orc
-0.67
ory
-0.65
croft
-0.64
utch
-0.63
tty
-0.63
itch
-0.63
miss
-0.63
POSITIVE LOGITS
also
0.72
Thumbnails
0.63
chery
0.61
Recomm
0.61
ALSO
0.61
cially
0.60
DES
0.60
simultaneously
0.60
also
0.59
crabs
0.58
Activations Density 0.134%