INDEX
Explanations
auxiliary verbs followed by descriptors
New Auto-Interp
Negative Logits
chinese
0.48
没有
0.48
可以使用
0.45
চায়
0.44
没有
0.44
incorrect
0.43
مهمه
0.43
Correct
0.43
সাহায
0.42
如果没有
0.42
POSITIVE LOGITS
inescap
0.66
utterly
0.62
ensued
0.60
倘
0.60
ought
0.60
hereby
0.58
entailed
0.57
portraying
0.57
strikingly
0.57
remarkably
0.56
Activations Density 0.008%