INDEX
Explanations
grammatical endings across languages
New Auto-Interp
Negative Logits
I
0.54
ieten
0.43
iamo
0.43
卞
0.41
อย่าง
0.41
ก็
0.38
ooker
0.38
ijen
0.38
unu
0.37
iveness
0.37
POSITIVE LOGITS
ar
0.82
et
0.80
an
0.76
us
0.66
ur
0.64
на
0.64
ad
0.63
p
0.63
ab
0.61
and
0.61
Activations Density 0.097%