INDEX
Explanations
phrases indicating a comparison or contrast between two ideas
transitional phrases indicating contrast or comparison
New Auto-Interp
Negative Logits
IQ
-0.70
UNCLASSIFIED
-0.63
stead
-0.60
DL
-0.58
ilty
-0.58
Estimated
-0.55
compr
-0.55
GPU
-0.54
co
-0.54
coat
-0.53
POSITIVE LOGITS
relying
0.74
let
0.72
SPONSORED
0.70
ours
0.66
mere
0.64
Instead
0.61
opting
0.59
invoke
0.59
recomb
0.58
capit
0.58
Activations Density 0.088%