INDEX
Explanations
expressions that indicate contradiction or contrasting statements
New Auto-Interp
Negative Logits
يتيمه
-0.56
cektir
-0.54
lenker
-0.53
Theſe
-0.52
xFFFFFFFF
-0.51
GHIJKLM
-0.51
أما
-0.50
mukana
-0.50
первых
-0.50
Vordergrund
-0.47
POSITIVE LOGITS
yet
1.05
Yet
0.93
yet
0.90
Trotzdem
0.87
pourtant
0.87
Yet
0.86
despite
0.80
Trotz
0.79
YET
0.79
nevertheless
0.79
Activations Density 0.286%