INDEX
Explanations
phrases signaling intention or preference
New Auto-Interp
Negative Logits
Ữ
-0.53
therefore
-0.52
λοι
-0.50
Rè
-0.50
YesNo
-0.50
thus
-0.49
tuce
-0.49
ösungen
-0.49
จึง
-0.49
salu
-0.48
POSITIVE LOGITS
nonetheless
0.80
nevertheless
0.73
trotzdem
0.71
それにしても
0.70
それでも
0.66
$")
0.65
still
0.60
</tfoot>
0.60
卻
0.59
")->
0.56
Activations Density 0.918%