INDEX
Explanations
phrases indicating alternatives or contrasts
instances of a contrasting phrase or structure that begins with "Instead."
New Auto-Interp
Negative Logits
Condition
-0.66
vez
-0.62
SF
-0.62
ASED
-0.61
ented
-0.59
ental
-0.58
CLUD
-0.58
ENTS
-0.57
gin
-0.57
AG
-0.55
POSITIVE LOGITS
ples
0.74
thereof
0.72
opting
0.71
of
0.69
ortun
0.68
ilon
0.68
achu
0.66
terness
0.66
,.
0.63
we
0.63
Activations Density 0.024%