INDEX
Explanations
themes related to social inequality and class exploitation
New Auto-Interp
Negative Logits
éĢĶ
-0.17
lds
-0.16
ichert
-0.16
pery
-0.16
zyst
-0.15
rons
-0.15
.recycle
-0.15
ÑĤемпеÑĢаÑĤÑĥÑĢа
-0.14
ture
-0.14
997
-0.14
POSITIVE LOGITS
è§Ĵ
0.19
hol
0.17
Watkins
0.17
masters
0.15
eros
0.15
ÑĢÑĮ
0.14
Hol
0.14
urv
0.14
Dra
0.14
inya
0.14
Activations Density 0.333%