INDEX
Explanations
phrases related to political or social activism
references to the concept of being anti or against something
New Auto-Interp
Negative Logits
ding
-0.97
cutting
-0.86
edin
-0.83
nets
-0.82
rings
-0.81
mable
-0.79
alties
-0.78
ells
-0.77
erick
-0.77
accompan
-0.77
POSITIVE LOGITS
opsis
0.85
urdue
0.80
oxin
0.79
iso
0.76
ño
0.73
henko
0.73
Wan
0.71
xon
0.69
chio
0.68
���
0.68
Activations Density 0.060%