INDEX
Explanations
mentions of criticism or conflict
New Auto-Interp
Negative Logits
ILCS
-0.79
husbands
-0.73
husband
-0.66
)].
-0.64
arers
-0.64
ante
-0.64
selves
-0.61
metre
-0.61
GOODMAN
-0.61
ahi
-0.60
POSITIVE LOGITS
pard
0.89
pardon
0.80
Mexicans
0.80
Vladimir
0.78
Russia
0.78
his
0.77
Pence
0.75
Putin
0.75
himself
0.74
Monica
0.74
Activations Density 0.580%