INDEX
Explanations
references to collective ownership or communal concepts
New Auto-Interp
Negative Logits
wife
-0.20
wife
-0.17
husband
-0.15
妻
-0.15
Wife
-0.15
ovit
-0.15
-wife
-0.15
career
-0.15
ed
-0.14
ools
-0.14
POSITIVE LOGITS
tesy
0.29
SEL
0.29
ourselves
0.29
selves
0.28
lives
0.28
bodies
0.25
Lives
0.24
our
0.24
hearts
0.24
mutual
0.23
Activations Density 0.221%