INDEX
Explanations
references to familial and marital relationships, particularly involving spouses and their roles
New Auto-Interp
Negative Logits
seguir
-0.46
=>'
-0.46
glyco
-0.45
を出
-0.45
PHAN
-0.43
otis
-0.43
<eos>
-0.43
ekt
-0.42
fan
-0.42
'''
-0.42
POSITIVE LOGITS
wife
1.39
wives
1.37
spouses
1.23
Wife
1.22
Wife
1.20
spouse
1.17
married
1.14
wife
1.13
Wives
1.10
WIFE
1.08
Activations Density 0.226%