INDEX
Explanations
contrastive statements or qualifications in discourse
New Auto-Interp
Negative Logits
Mero
-0.99
Dorian
-0.96
ſelf
-0.92
xenia
-0.89
gie
-0.88
Yerevan
-0.87
Nemo
-0.87
$_"
-0.87
NEO
-0.86
Nara
-0.86
POSITIVE LOGITS
But
1.81
but
1.62
but
1.47
BUT
1.46
But
1.46
BUT
1.24
pero
1.16
nhưng
0.94
แต่
0.90
但
0.90
Activations Density 0.132%