INDEX
Explanations
sentences containing contrasting statements or ideas
the word "but" to indicate contrast or exception
New Auto-Interp
Negative Logits
roy
-0.85
oys
-0.75
oun
-0.71
ump
-0.69
itto
-0.69
osite
-0.68
uddin
-0.68
ands
-0.66
oop
-0.65
asar
-0.64
POSITIVE LOGITS
tons
1.05
alas
1.04
nevertheless
1.03
nonetheless
0.99
fortunately
0.91
luckily
0.90
beware
0.83
insofar
0.82
hey
0.80
unfortunately
0.76
Activations Density 0.177%