INDEX
Explanations
references to infidelity and marital issues
New Auto-Interp
Negative Logits
antity
-0.16
clusive
-0.15
homophobic
-0.15
acci
-0.15
ylan
-0.15
Einsatz
-0.15
essim
-0.14
opr
-0.14
#+#
-0.14
persec
-0.14
POSITIVE LOGITS
affairs
0.33
affair
0.32
Affairs
0.27
dall
0.26
poly
0.25
extr
0.25
phil
0.25
serial
0.22
bedding
0.22
prom
0.22
Activations Density 0.247%