INDEX
Explanations
possessive pronouns and articles after commas
New Auto-Interp
Negative Logits
carried
0.56
MAN
0.55
From
0.54
Works
0.51
இருந்தது
0.51
This
0.50
Detailed
0.49
worked
0.48
triggered
0.48
сима
0.48
POSITIVE LOGITS
your
0.83
our
0.80
the
0.69
vaš
0.67
vaše
0.67
আপনার
0.66
私たちの
0.66
their
0.65
നിങ്ങളുടെ
0.65
ஒரு
0.64
Activations Density 0.001%