INDEX
Explanations
themes related to social justice and advocacy
New Auto-Interp
Negative Logits
rak
-0.16
_SAN
-0.15
rees
-0.15
AndServe
-0.15
ogn
-0.15
ابÛĮ
-0.14
orn
-0.14
komp
-0.14
asma
-0.14
inne
-0.14
POSITIVE LOGITS
with
0.34
with
0.26
with
0.25
dengan
0.24
unfavor
0.21
avec
0.21
vỼi
0.21
swith
0.20
together
0.20
ewith
0.19
Activations Density 0.086%