INDEX
Explanations
contrasting proprietary vs open
New Auto-Interp
Negative Logits
was
1.04
renormal
1.02
and
0.96
reorgan
0.96
{0.95
is
0.89
ва
0.88
emergence
0.88
είχε
0.87
he
0.86
POSITIVE LOGITS
ir
1.33
unlike
1.27
Unlike
1.26
im
1.20
os
1.19
uts
1.17
یت
1.02
я
1.02
uk
0.98
il
0.97
Activations Density 0.004%