INDEX
Explanations
to followed by other words
infinitive phrases
New Auto-Interp
Negative Logits
an
0.75
ע
0.65
ü
0.63
മ
0.56
्यांना
0.55
ای
0.55
ла
0.54
와의
0.54
ী
0.54
ാ
0.54
POSITIVE LOGITS
is
0.61
t
0.59
l
0.59
।
0.57
h
0.55
n
0.55
()
0.54
.”
0.54
r
0.53
financeiros
0.53
Activations Density 0.038%