INDEX
Explanations
pronouns followed by possessive pronouns
possessive pronouns and phrases indicating ownership or belonging
New Auto-Interp
Negative Logits
ENCE
-0.77
umption
-0.74
otto
-0.73
ĸļ
-0.72
Trader
-0.72
rium
-0.71
enment
-0.69
matter
-0.68
hip
-0.68
Spoiler
-0.68
POSITIVE LOGITS
favorite
1.04
favorites
1.04
favourite
1.02
assistants
0.94
favourites
0.93
own
0.93
daughters
0.92
buddies
0.89
predecessors
0.88
sons
0.87
Activations Density 0.073%