INDEX
Explanations
cross-lingual representation
New Auto-Interp
Negative Logits
зив
0.42
ruthless
0.42
alya
0.41
aa
0.41
ซีน
0.40
universities
0.40
тук
0.39
Wissenschaft
0.39
andinavian
0.39
වැඩ
0.39
POSITIVE LOGITS
مرد
0.41
উদ্বাস্ত
0.41
anu
0.39
ንድ
0.39
ንዱ
0.39
पिच
0.39
Segundo
0.38
ناح
0.38
巴
0.38
teammates
0.37
Activations Density 0.003%