INDEX
Explanations
mentions of the word "marriage," particularly in the context of important or controversial events
references to marriage, particularly focusing on same-sex marriage
New Auto-Interp
Negative Logits
istg
-0.71
ACC
-0.68
Sho
-0.66
ops
-0.66
Zap
-0.65
owl
-0.64
wo
-0.64
ensor
-0.63
ateurs
-0.63
alert
-0.62
POSITIVE LOGITS
marriage
3.73
Marriage
3.04
marriage
2.89
marriages
2.88
marrying
2.01
marry
1.97
wedding
1.94
marital
1.86
divorce
1.85
married
1.85
Activations Density 0.017%