INDEX
Explanations
descriptions of inadequacy or error
New Auto-Interp
Negative Logits
真的很
0.47
真的是
0.46
是真的
0.42
вполне
0.41
durchaus
0.40
没有任何
0.40
Thankfully
0.40
obvious
0.39
অবশ্য
0.39
明明
0.39
POSITIVE LOGITS
unreliable
1.26
ineffective
1.13
inadequate
1.11
unsatisfactory
1.09
suboptimal
1.08
flawed
1.04
problematic
1.00
imperfect
0.98
inefficient
0.95
questionable
0.94
Activations Density 0.023%