INDEX
Explanations
references to possessive pronouns, particularly 'its'
"its" followed by a noun
possessive its before nouns
New Auto-Interp
Negative Logits
Tulane
-0.55
<bos>
-0.52
py
-0.50
Athenians
-0.49
table
-0.47
Ananda
-0.44
probleme
-0.44
Dede
-0.43
test
-0.42
HY
-0.42
POSITIVE LOGITS
its
1.18
Its
1.15
Its
1.09
ITS
1.09
它的
1.00
its
0.86
المعيارى
0.86
Itself
0.85
saraba
0.83
Twas
0.82
Activations Density 0.202%