INDEX
Explanations
themes of infidelity and romantic relationships
New Auto-Interp
Negative Logits
husband
-0.18
ä¸Ī夫
-0.17
Husband
-0.17
夫
-0.16
sex
-0.16
husbands
-0.15
hus
-0.15
alone
-0.15
sey
-0.14
ultan
-0.14
POSITIVE LOGITS
whom
0.16
younger
0.15
someone
0.15
compatible
0.15
compatible
0.15
incompatible
0.15
ermann
0.15
afar
0.15
ÑĤогда
0.15
viar
0.14
Activations Density 0.202%