INDEX
Explanations
foreign language conjunctions
New Auto-Interp
Negative Logits
Contrary
0.39
asas
0.38
Relations
0.38
ruining
0.38
<unused94>
0.38
relations
0.38
opponent
0.38
dumping
0.37
financial
0.37
Quand
0.37
POSITIVE LOGITS
možda
0.43
或许
0.42
ຕິດຕ
0.41
或許
0.41
شاید
0.40
kanske
0.38
uitgenodigd
0.38
ך
0.38
参
0.38
यामुळे
0.38
Activations Density 0.000%