INDEX
Explanations
of followed by noun
of approximately or multi
New Auto-Interp
Negative Logits
on
1.00
a
0.91
ר
0.89
y
0.89
e
0.88
the
0.88
i
0.86
o
0.86
in
0.85
ا
0.85
POSITIVE LOGITS
'
1.16
of
0.99
of
0.95
'،
0.88
<unused2231>
0.86
của
0.85
ного
0.84
của
0.81
Of
0.79
j
0.79
Activations Density 1.378%