INDEX
Explanations
phrases related to social justice and human rights issues
New Auto-Interp
Negative Logits
issen
-0.16
ategy
-0.15
ervas
-0.15
asz
-0.14
otlin
-0.14
lient
-0.14
outine
-0.14
IFn
-0.14
ìĪł
-0.14
.gov
-0.13
POSITIVE LOGITS
some
0.37
critics
0.37
many
0.37
experts
0.31
observers
0.30
detr
0.29
Crit
0.28
many
0.27
some
0.27
opponents
0.26
Activations Density 0.290%