INDEX
Explanations
phrases related to exclusion, isolation, and avoidance
terms related to exclusion from social, political, or economic contexts
New Auto-Interp
Negative Logits
ebus
-0.77
yss
-0.74
raq
-0.70
acca
-0.70
ectar
-0.66
llular
-0.65
rien
-0.64
EY
-0.63
Herald
-0.62
attr
-0.62
POSITIVE LOGITS
exclusion
1.01
ary
0.92
ism
0.78
lessness
0.77
spoilers
0.76
prejudice
0.76
screening
0.75
clusion
0.74
naire
0.73
bias
0.72
Activations Density 0.010%