INDEX
Explanations
possessive forms indicating ownership or association
New Auto-Interp
Negative Logits
TAIN
-0.86
obin
-0.84
ij士
-0.77
udo
-0.74
quished
-0.73
yrs
-0.72
nir
-0.72
arians
-0.72
ernels
-0.71
orns
-0.71
POSITIVE LOGITS
own
1.25
favorite
1.08
favourite
1.06
genitals
0.94
birthday
0.92
behalf
0.91
willingness
0.90
anatomy
0.89
imagination
0.87
wardrobe
0.87
Activations Density 0.043%