INDEX
Explanations
references to actions involving personal interactions or conflicts
New Auto-Interp
Negative Logits
selves
-0.83
husbands
-0.66
collectively
-0.66
atures
-0.65
unison
-0.64
hub
-0.61
husband
-0.61
womb
-0.61
ballots
-0.60
merger
-0.60
POSITIVE LOGITS
himself
1.48
his
0.94
Himself
0.93
panic
0.75
erection
0.73
girlfriend
0.69
Jr
0.66
Jr
0.66
ejac
0.65
sul
0.64
Activations Density 0.709%