INDEX
Explanations
terms related to pro-life or anti-abortion sentiments
New Auto-Interp
Negative Logits
.sponge
-0.15
hift
-0.15
h
-0.15
vic
-0.15
βολ
-0.14
pic
-0.14
EMPLARY
-0.14
адÑĥ
-0.14
AREST
-0.14
bet
-0.13
POSITIVE LOGITS
bon
0.20
actively
0.20
tracted
0.19
pped
0.19
choice
0.18
lix
0.17
ponent
0.17
choice
0.17
wl
0.16
prit
0.16
Activations Density 0.016%