INDEX
Explanations
improvements and explanations
New Auto-Interp
Negative Logits
fish
0.80
war
0.74
is
0.73
if
0.73
ur
0.73
k
0.72
one
0.71
i
0.71
I
0.70
people
0.70
POSITIVE LOGITS
Verbesser
1.08
Improvements
0.89
mejoras
0.83
Improvements
0.82
улуч
0.81
improved
0.81
verbess
0.80
изменений
0.80
увеличения
0.79
improvements
0.79
Activations Density 0.327%