INDEX
Explanations
instances of coercive and abusive behavior in relationships
New Auto-Interp
Negative Logits
selves
-0.82
collectively
-0.72
atures
-0.68
£ı
-0.67
miscar
-0.67
taboola
-0.65
husband
-0.64
Founding
-0.64
result
-0.64
etheless
-0.63
POSITIVE LOGITS
himself
1.32
his
0.99
Himself
0.87
Jr
0.73
me
0.71
girlfriend
0.69
erection
0.69
raping
0.69
Jr
0.69
sed
0.66
Activations Density 0.406%