INDEX
Explanations
place names and initialisms
New Auto-Interp
Negative Logits
It
0.85
AT
0.77
t
0.70
re
0.68
an
0.68
for
0.68
that
0.68
it
0.66
that
0.64
et
0.64
POSITIVE LOGITS
powied
0.68
были
0.68
ปี
0.68
,.
0.66
dolayı
0.61
;
0.60
написа
0.59
περιο
0.59
ز
0.59
σε
0.58
Activations Density 0.766%