INDEX
Explanations
phrases related to raising awareness about various social, health, and environmental issues
New Auto-Interp
Negative Logits
ereotype
-0.16
otype
-0.16
Ĥ
-0.15
ìĸ´
-0.15
ÅĻ
-0.14
omo
-0.14
iska
-0.14
iliz
-0.14
oples
-0.14
endo
-0.14
POSITIVE LOGITS
ness
0.30
es
0.28
-ra
0.27
s
0.20
fulness
0.20
/alert
0.20
rais
0.19
(es
0.19
nder
0.18
nes
0.18
Activations Density 0.022%