INDEX
Explanations
english or other language words
New Auto-Interp
Negative Logits
beak
0.38
ÕES
0.37
NPA
0.37
impersonal
0.36
seben
0.36
pire
0.36
고자
0.35
selfless
0.35
predic
0.34
CPA
0.34
POSITIVE LOGITS
ம
0.46
鞑
0.43
qualité
0.42
bình
0.41
पहाड़ी
0.41
adlı
0.39
]//
0.38
সম
0.38
ровать
0.38
नंद
0.37
Activations Density 0.000%