INDEX
Explanations
mentions of the word "marriage"
references to legal aspects of marriage
New Auto-Interp
Negative Logits
acco
-0.80
OLOG
-0.76
ourning
-0.71
atche
-0.70
Lyons
-0.67
anwhile
-0.67
Flavoring
-0.66
arij
-0.65
Downloadha
-0.65
uden
-0.64
POSITIVE LOGITS
equality
1.05
equality
0.92
Equality
0.86
riages
0.85
making
0.81
able
0.78
bands
0.78
couples
0.76
marriage
0.75
ring
0.74
Activations Density 0.021%