INDEX
Explanations
phrases related to social justice and the ethical implications of policies
New Auto-Interp
Negative Logits
elman
-0.15
antis
-0.14
itan
-0.14
ignet
-0.14
eg
-0.14
ÑĥÑģÑĤа
-0.14
idden
-0.14
ussen
-0.14
ools
-0.14
iscard
-0.14
POSITIVE LOGITS
inform
0.19
supers
0.18
trump
0.17
Inform
0.17
proceed
0.16
happen
0.16
receives
0.16
Proceed
0.15
co
0.15
£¨
0.15
Activations Density 0.257%