INDEX
Explanations
contrasting caveats and limitations
New Auto-Interp
Negative Logits
remarkably
1.06
Incidentally
1.00
amazingly
0.98
unbelievably
0.96
невероят
0.95
unfortunately
0.89
可惜
0.86
Sadly
0.86
Incidentally
0.84
even
0.81
POSITIVE LOGITS
unless
1.09
workaround
1.05
Nonetheless
1.04
unless
1.03
needs
1.01
Nevertheless
1.00
worse
0.93
それでも
0.93
Unless
0.93
inkább
0.92
Activations Density 0.153%