INDEX
Explanations
sentence endings followed by pronouns
New Auto-Interp
Negative Logits
ত্রী
0.74
তিপ
0.70
های
0.69
তম
0.67
蚪
0.66
ভূতি
0.66
propylene
0.65
apeut
0.65
Мо
0.64
駐
0.64
POSITIVE LOGITS
1.43
Adds
1.33
And
1.26
ujarnya
1.24
Vocabulary
1.21
Tämä
1.21
That
1.20
这句话
1.18
Emotions
1.18
This
1.18
Activations Density 0.326%