INDEX
Explanations
verb followed by preposition
New Auto-Interp
Negative Logits
of
0.44
اک
0.43
이야
0.41
Einstellungen
0.40
фаразы
0.40
含ま
0.40
ूज
0.39
ীষ
0.39
폿
0.39
этому
0.39
POSITIVE LOGITS
ت
0.71
by
0.59
س
0.57
with
0.56
to
0.53
on
0.53
ina
0.49
с
0.49
from
0.48
et
0.47
Activations Density 0.874%