INDEX
Explanations
specific references to a person's husband
references to a specific individual identified as "husband."
New Auto-Interp
Negative Logits
McC
-0.80
Flavoring
-0.77
ostic
-0.72
obyl
-0.68
ortmund
-0.68
etting
-0.66
spir
-0.66
yss
-0.66
UGE
-0.65
Barkley
-0.65
POSITIVE LOGITS
hood
0.90
friend
0.87
husband
0.85
pins
0.83
loo
0.78
shake
0.75
pin
0.75
Romeo
0.74
dad
0.73
wife
0.72
Activations Density 0.019%