INDEX
Explanations
instances of the word "but," indicating a contrast or an exception in the text
New Auto-Interp
Negative Logits
and
-0.76
himself
-0.66
herself
-0.64
THOUGH
-0.63
Though
-0.62
Though
-0.58
entanto
-0.56
Accordingly
-0.56
them
-0.51
bzw
-0.50
POSITIVE LOGITS
then
1.39
hey
1.18
alas
1.15
unfortunately
1.06
also
1.03
it
0.97
luckily
0.96
yeah
0.94
if
0.94
chery
0.93
Activations Density 0.160%