INDEX
Explanations
topics related to sexism and racism in society
New Auto-Interp
Negative Logits
Bers
-0.16
_rq
-0.16
ocker
-0.16
izen
-0.15
ogue
-0.15
liers
-0.14
İ·
-0.14
ifestyles
-0.13
apel
-0.13
UIWindow
-0.13
POSITIVE LOGITS
towards
0.41
toward
0.38
against
0.36
against
0.31
Against
0.30
Towards
0.29
Towards
0.28
Against
0.27
Tow
0.27
åIJij
0.25
Activations Density 0.136%