INDEX
Explanations
ideologically-charged terms related to political ideology and activism
New Auto-Interp
Negative Logits
increments
-0.77
Delivery
-0.76
angan
-0.76
bilateral
-0.72
Owner
-0.70
ãĥ¼ãĥ³
-0.68
Ī
-0.65
shown
-0.63
Bulldogs
-0.62
eway
-0.62
POSITIVE LOGITS
ervatives
1.44
ervative
1.35
alike
1.09
rejoice
1.08
paces
1.07
everywhere
1.05
who
1.04
peak
1.00
hip
0.92
Anonymous
0.89
Activations Density 0.199%