INDEX
Explanations
the word "women."
references to women or feminine themes
New Auto-Interp
Negative Logits
suspended
-0.74
jails
-0.68
jail
-0.65
decline
-0.64
halves
-0.64
Bots
-0.63
script
-0.63
chill
-0.62
exile
-0.62
refund
-0.60
POSITIVE LOGITS
omen
5.09
oman
1.86
omes
1.34
ome
1.32
omers
1.21
omon
1.18
oming
1.14
osen
1.10
oms
1.09
omal
1.06
Activations Density 0.016%