INDEX
Explanations
terms related to social justice and activism
New Auto-Interp
Negative Logits
izer
-0.20
izers
-0.20
ilation
-0.19
imization
-0.19
isation
-0.19
isers
-0.18
IZATION
-0.18
ellation
-0.17
verage
-0.17
dration
-0.17
POSITIVE LOGITS
ing
0.27
ographically
0.23
rating
0.23
scri
0.22
actively
0.20
ivating
0.20
astically
0.20
antically
0.20
imating
0.20
itating
0.19
Activations Density 0.080%