INDEX
Negative Logits
oftentimes
0.51
管控
0.48
ഷണ
0.45
каждой
0.45
づくり
0.44
umuza
0.43
每一個
0.43
এটির
0.42
堣
0.42
முழுவதும்
0.42
POSITIVE LOGITS
was
0.49
did
0.42
underlines
0.40
went
0.40
sset
0.39
advice
0.39
stung
0.39
travelled
0.38
suffered
0.38
Só
0.38
Activations Density 0.001%