INDEX
Explanations
phrases related to rights and social justice issues
New Auto-Interp
Negative Logits
ãĥ¼ãĥĢ
-0.17
ava
-0.16
rud
-0.16
kiye
-0.15
AVA
-0.15
imest
-0.15
زÙĩ
-0.14
ãĤ¿ãĥ«
-0.14
acher
-0.14
olvable
-0.14
POSITIVE LOGITS
covers
0.17
enos
0.17
cover
0.16
æª
0.16
-cover
0.16
covering
0.15
'o
0.15
cover
0.15
Covers
0.15
yw
0.15
Activations Density 0.020%