INDEX
Explanations
words related to contrasting or stating exceptions
the word "but" in various contexts
New Auto-Interp
Negative Logits
roy
-0.74
ump
-0.67
ogical
-0.65
uto
-0.65
tnc
-0.64
ogo
-0.63
uly
-0.63
edu
-0.61
ands
-0.61
urdy
-0.61
POSITIVE LOGITS
tons
1.22
alas
1.05
nevertheless
0.98
chery
0.93
fortunately
0.91
unfortunately
0.89
nonetheless
0.89
luckily
0.88
chers
0.85
preferably
0.84
Activations Density 0.219%