INDEX
Explanations
capital cities of countries
New Auto-Interp
Negative Logits
と共に
0.38
从而
0.36
さんと
0.36
거고
0.36
туда
0.35
所で
0.35
그래서
0.34
بتوان
0.34
যাদের
0.33
别人的
0.33
POSITIVE LOGITS
consists
0.82
είναι
0.76
is
0.76
are
0.71
differs
0.67
jsou
0.60
involves
0.59
comprises
0.59
bestaat
0.57
transcends
0.57
Activations Density 0.090%