INDEX
Explanations
relationships and dynamics surrounding fidelity and romance
New Auto-Interp
Negative Logits
Frau
-0.34
woman
-0.32
vrouw
-0.30
wife
-0.29
woman
-0.29
Woman
-0.29
wife
-0.28
lady
-0.28
wives
-0.28
-wife
-0.27
POSITIVE LOGITS
male
0.38
males
0.38
husbands
0.37
boyfriend
0.37
çĶ·åŃIJ
0.34
husband
0.33
guy
0.33
masculine
0.32
men
0.32
handsome
0.32
Activations Density 0.547%