INDEX
Explanations
phrases starting with "with"
with + descriptive words
New Auto-Interp
Negative Logits
ка
0.89
ara
0.80
ak
0.73
in
0.64
في
0.64
ని
0.63
써
0.63
inactivació
0.62
ada
0.62
reservados
0.62
POSITIVE LOGITS
with
1.31
y
1.20
t
1.03
ב
1.00
ED
0.94
ת
0.93
l
0.93
は
0.91
with
0.89
WITH
0.89
Activations Density 0.645%