INDEX
Explanations
instances of bans or restrictions related to individuals or events
New Auto-Interp
Negative Logits
utes
-0.17
chers
-0.17
elere
-0.16
aversable
-0.15
ulong
-0.15
ateral
-0.14
eldorf
-0.14
ingly
-0.14
acher
-0.14
avirus
-0.14
POSITIVE LOGITS
woman
0.26
someone
0.22
man
0.21
guy
0.21
young
0.21
couple
0.19
girl
0.18
person
0.17
employee
0.17
lady
0.17
Activations Density 0.244%