INDEX
Explanations
possessions, experiences, states
New Auto-Interp
Negative Logits
ка
0.47
ك
0.46
০০
0.36
for
0.36
அவள்
0.32
,
0.32
다면
0.31
0
0.31
т
0.31
나
0.31
POSITIVE LOGITS
been
0.66
ע
0.52
been
0.49
BEEN
0.43
Been
0.41
sido
0.40
had
0.36
Been
0.36
明显的
0.35
ס
0.34
Activations Density 0.265%