INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ка
1.47
un
1.45
Y
1.35
as
1.34
ˆ‚
1.28
ung
1.25
ko
1.22
у
1.19
tono
1.18
و
1.16
POSITIVE LOGITS
means
1.60
sputum
1.55
cento
1.50
dır
1.47
produzir
1.47
daí
1.47
chamada
1.46
כך
1.46
malef
1.45
admis
1.43
Activations Density 0.123%