INDEX
Explanations
phrases contrasting two different perspectives or pieces of information
contrasting ideas or perspectives
New Auto-Interp
Negative Logits
semb
-0.66
asley
-0.64
76561
-0.63
STAR
-0.63
Saras
-0.62
Annotations
-0.61
stein
-0.61
suffice
-0.59
before
-0.59
icago
-0.58
POSITIVE LOGITS
srfAttach
0.77
opposite
0.73
)].
0.65
cul
0.65
Õ
0.64
second
0.63
latter
0.62
ouple
0.61
middle
0.60
cum
0.60
Activations Density 0.060%