INDEX
Explanations
phrases indicating contrast or opposition
New Auto-Interp
Negative Logits
'
-0.69
WA
-0.55
;
-0.54
zeiro
-0.52
X
-0.51
kem
-0.51
vov
-0.51
Ge
-0.51
ћа
-0.51
ibm
-0.50
POSITIVE LOGITS
ostante
1.88
despite
1.55
despite
1.48
Trotz
1.46
Despite
1.45
Despite
1.41
nonostante
1.41
withstanding
1.36
Malgré
1.34
spite
1.33
Activations Density 0.092%