INDEX
Explanations
references to personal ownership or identity
New Auto-Interp
Negative Logits
mi
-0.20
themselves
-0.18
incy
-0.16
mand
-0.15
no
-0.15
long
-0.15
mu
-0.15
rance
-0.14
mat
-0.14
me
-0.14
POSITIVE LOGITS
embros
0.27
opia
0.26
opic
0.25
rtle
0.23
anmar
0.23
riad
0.21
adows
0.21
batis
0.20
/us
0.19
zzo
0.19
Activations Density 0.197%