INDEX
Negative Logits
하
-0.08
ces
-0.08
영상
-0.08
Bc
-0.08
cec
-0.08
CEC
-0.07
dagger
-0.07
aska
-0.07
ECC
-0.07
headings
-0.07
POSITIVE LOGITS
witte
0.09
stereotypes
0.09
/social
0.09
thousands
0.08
stereotyp
0.08
quotas
0.08
Thousands
0.08
stereotype
0.08
iggs
0.08
fox
0.07
Activations Density 0.005%