INDEX
Explanations
pronouns referring to a person
New Auto-Interp
Negative Logits
for
0.51
我們
0.48
forgery
0.47
我們要
0.47
我们就
0.47
ି
0.45
ஒன்றை
0.45
কিছু
0.44
আমরা
0.44
ோம்
0.44
POSITIVE LOGITS
wrote
0.49
flew
0.46
ffes
0.46
texted
0.45
could
0.44
wore
0.44
mming
0.44
can
0.44
泂
0.43
czy
0.43
Activations Density 0.000%