INDEX
Explanations
elements related to societal issues and advocacy
New Auto-Interp
Negative Logits
785
-0.15
oping
-0.15
åĽ
-0.14
occo
-0.14
uc
-0.14
whereas
-0.14
awy
-0.14
åįļ
-0.13
aw
-0.13
776
-0.13
POSITIVE LOGITS
èĥĮ
0.17
dff
0.14
ê²ĥìĿĢ
0.14
_:*
0.14
ì¹ĺëĬĶ
0.14
ìŀIJëĬĶ
0.14
"is
0.14
ELL
0.13
butt
0.13
è¶Ĭ
0.13
Activations Density 0.462%