INDEX
Explanations
pronouns and possessive determiners referring to people
pronouns and their associated references to individuals
New Auto-Interp
Negative Logits
haus
-0.85
gap
-0.79
GAN
-0.75
Ïĥ
-0.74
tree
-0.74
TABLE
-0.73
ij士
-0.73
river
-0.72
ÏĦ
-0.71
Slot
-0.70
POSITIVE LOGITS
own
1.42
willingness
1.22
inability
1.17
penchant
1.11
entire
1.03
favourite
1.03
consequ
1.00
unwillingness
0.99
propensity
0.98
susceptibility
0.94
Activations Density 0.096%