INDEX
Explanations
phrases that indicate a comparison or evaluation of success
New Auto-Interp
Negative Logits
olec
-0.15
arkin
-0.15
utow
-0.15
Remaining
-0.14
urai
-0.14
ConverterFactory
-0.14
atto
-0.14
aviour
-0.14
<count
-0.13
AME
-0.13
POSITIVE LOGITS
worse
0.20
improvement
0.18
improve
0.17
mejorar
0.17
improves
0.17
Worse
0.16
Improve
0.16
melhor
0.15
orsche
0.15
improvements
0.15
Activations Density 0.116%