INDEX
Explanations
terms related to social or political activism
references to social movements and collective actions
New Auto-Interp
Negative Logits
Recent
-0.64
erning
-0.63
amiliar
-0.63
utics
-0.61
particulars
-0.61
addle
-0.59
serving
-0.58
distinctive
-0.57
odic
-0.57
doms
-0.55
POSITIVE LOGITS
didnt
1.60
pretended
1.21
pissed
1.18
forgot
1.18
screamed
1.17
lied
1.16
messed
1.16
decided
1.14
blew
1.13
cheated
1.13
Activations Density 0.610%