INDEX
Explanations
pronouns and surrounding punctuation
New Auto-Interp
Negative Logits
踰
0.39
tocó
0.38
хара
0.38
quốc
0.38
사의
0.37
ヒ
0.37
xác
0.36
ুপ
0.35
冎
0.35
φος
0.35
POSITIVE LOGITS
it
0.62
它可以
0.59
It
0.58
Suitable
0.57
它
0.56
simply
0.54
同时也
0.52
meanwhile
0.52
它
0.52
moreover
0.51
Activations Density 0.000%