INDEX
Negative Logits
in
0.67
是
0.66
は
0.64
في
0.61
e
0.58
is
0.57
在
0.57
antara
0.56
و
0.55
عن
0.55
POSITIVE LOGITS
y
0.47
Variation
0.45
That
0.44
t
0.43
'.
0.42
themselves
0.39
conspiring
0.39
Who
0.39
tım
0.39
यों
0.38
Activations Density 0.266%
in
是
は
في
e
is
在
antara
و
عن
y
Variation
That
t
'.
themselves
conspiring
Who
tım
यों