INDEX
Explanations
sample, placeholder, dummy, replica, example
New Auto-Interp
Negative Logits
the
0.79
the
0.76
an
0.64
a
0.60
The
0.59
t
0.57
AKA
0.55
THE
0.53
thed
0.52
ti
0.51
POSITIVE LOGITS
Fmat
0.60
Ws
0.59
Ι
0.59
ንስ
0.55
clientes
0.55
plufieurs
0.54
ሎጂ
0.54
لأ
0.53
ഡ്
0.52
Texts
0.52
Activations Density 0.229%