INDEX
Explanations
key improvements and explanations
New Auto-Interp
Negative Logits
zych
0.39
asie
0.33
iconductor
0.33
гно
0.33
dre
0.33
IGER
0.33
olit
0.33
lada
0.33
ugly
0.32
扈
0.32
POSITIVE LOGITS
improvements
0.61
mejoras
0.52
Improvements
0.50
improvement
0.49
takeaways
0.48
Improvements
0.47
ポイント
0.46
improvements
0.46
features
0.46
개선
0.44
Activations Density 0.004%