INDEX
Explanations
references to social issues and movements
New Auto-Interp
Negative Logits
668
-0.15
eward
-0.15
/generated
-0.15
بÙĨدÛĮ
-0.14
á»ķ
-0.14
ãĥĹãĥª
-0.13
elder
-0.13
ãģĨãģ¡
-0.13
ìħĺ
-0.13
688
-0.13
POSITIVE LOGITS
atty
0.16
coli
0.15
ationally
0.15
é¹
0.15
readcr
0.15
agli
0.14
_nh
0.14
ogs
0.14
oload
0.14
peak
0.14
Activations Density 0.603%