INDEX
Explanations
references to community engagement and support
New Auto-Interp
Negative Logits
wife
-0.17
wife
-0.17
ed
-0.16
themselves
-0.16
гл
-0.15
deg
-0.15
iled
-0.15
妻
-0.15
solo
-0.15
och
-0.14
POSITIVE LOGITS
selves
0.31
tesy
0.29
lives
0.28
ourselves
0.27
SEL
0.27
bodies
0.24
hearts
0.23
dear
0.23
Lives
0.23
mutual
0.22
Activations Density 0.236%