INDEX
Explanations
Chinese, Spanish, and English prefixes
New Auto-Interp
Negative Logits
MUX
0.38
regarding
0.37
lâm
0.37
দমন
0.36
有關
0.36
ರಿಗೆ
0.35
concerning
0.35
ureshi
0.34
俘
0.34
Exempt
0.34
POSITIVE LOGITS
一下
0.50
Vice
0.44
夗
0.43
वस्था
0.39
Stable
0.38
看一下
0.38
Under
0.38
vice
0.38
跑步
0.38
Chair
0.37
Activations Density 0.004%