INDEX
Explanations
normalization and optimization
New Auto-Interp
Negative Logits
s
1.62
ssä
1.01
sp
0.97
nya
0.93
net
0.92
na
0.89
to
0.88
in
0.87
ll
0.87
ni
0.87
POSITIVE LOGITS
т
1.05
트
1.02
ر
1.02
و
0.99
ي
0.97
зо
0.93
ర్
0.91
р
0.90
та
0.90
ર
0.89
Activations Density 0.510%