INDEX
Explanations
phrases and concepts related to social dynamics and privilege
New Auto-Interp
Negative Logits
thumbnail
-0.17
HEN
-0.16
ίκ
-0.15
hiba
-0.15
reon
-0.15
Haz
-0.14
oklyn
-0.14
áºŃy
-0.14
ALLENG
-0.14
groupBox
-0.14
POSITIVE LOGITS
iner
0.16
dae
0.15
afd
0.15
ç»ıéªĮ
0.14
hol
0.14
inh
0.14
.glide
0.14
iná
0.14
reim
0.14
Std
0.14
Activations Density 0.460%