INDEX
Explanations
words related to contrasting or negating statements
conjunctions and transition phrases in discourse
New Auto-Interp
Negative Logits
Objects
-0.65
uggest
-0.64
chips
-0.60
lump
-0.59
grains
-0.59
rods
-0.58
overboard
-0.58
terness
-0.57
Heads
-0.57
slate
-0.57
POSITIVE LOGITS
unia
0.80
albeit
0.79
um
0.79
uh
0.75
but
0.75
andom
0.73
except
0.68
necess
0.68
yet
0.68
except
0.67
Activations Density 0.230%