INDEX
Explanations
the word "yet," indicating a sense of contrast or opposition
New Auto-Interp
Negative Logits
</b>
-0.78
a
-0.72
</i>
-0.71
bum
-0.67
.
-0.67
Crusoe
-0.66
ia
-0.66
PRO
-0.65
webdriver
-0.65
PRO
-0.64
POSITIVE LOGITS
YET
1.28
Yet
1.26
yet
1.24
yet
1.23
Yet
1.19
theless
1.06
Pourtant
0.96
TestingModule
0.92
Doch
0.92
INTERESAR
0.92
Activations Density 0.044%