INDEX
Explanations
mentions of specific familial relationships, particularly involving a husband
mentions of the word "husband."
New Auto-Interp
Negative Logits
McC
-0.76
ortmund
-0.71
Import
-0.69
spir
-0.68
EVA
-0.66
ourcing
-0.66
JPM
-0.66
Flavoring
-0.66
UGE
-0.65
ostics
-0.65
POSITIVE LOGITS
hood
0.82
loo
0.79
friend
0.79
pins
0.78
nesday
0.77
pin
0.76
husband
0.75
shake
0.73
dad
0.72
ry
0.69
Activations Density 0.019%