INDEX
Explanations
possessive pronouns and their associated subject
New Auto-Interp
Negative Logits
છું
0.43
část
0.41
itself
0.40
desej
0.40
중요
0.38
গুরুত্বপূর্ণ
0.37
زیب
0.37
த்தக
0.36
розта
0.36
kısmı
0.36
POSITIVE LOGITS
greatest
0.89
bread
0.76
biggest
0.72
specialty
0.65
superpower
0.65
grootste
0.65
biggest
0.63
lifeline
0.61
speciality
0.61
undo
0.61
Activations Density 0.027%