INDEX
Explanations
discussions about social justice and activism
New Auto-Interp
Negative Logits
undy
-0.16
olan
-0.15
odo
-0.14
ascar
-0.14
olas
-0.14
anship
-0.14
reo
-0.14
olith
-0.14
ola
-0.14
okino
-0.14
POSITIVE LOGITS
resistance
0.21
exempt
0.20
exemptions
0.18
null
0.18
exceptions
0.18
null
0.17
alternative
0.17
modifying
0.17
exemption
0.17
opposing
0.17
Activations Density 0.131%