INDEX
Explanations
their, belonging, instructions
New Auto-Interp
Negative Logits
MANAGER
0.41
Funds
0.40
comparisons
0.39
rifles
0.38
murdering
0.38
funds
0.37
ायत
0.37
compar
0.37
Compar
0.36
비교
0.36
POSITIVE LOGITS
他們的
0.47
他们的
0.43
羵
0.42
এদের
0.41
THEIR
0.40
Their
0.40
ያላቸው
0.40
тому
0.39
घरे
0.39
चरण
0.39
Activations Density 0.000%