INDEX
Explanations
content related to gender, specifically gender-neutral facilities and legal requirements for bathroom use based on gender identity
topics related to gender identity and related policies
New Auto-Interp
Negative Logits
Hacker
-0.64
%]
-0.58
betrayal
-0.57
Journalism
-0.56
extrap
-0.56
¯
-0.55
arrogance
-0.54
}:
-0.53
surprises
-0.53
underest
-0.53
POSITIVE LOGITS
instead
0.76
izont
0.72
safely
0.69
their
0.67
lawfully
0.67
uninterrupted
0.66
instead
0.66
disabled
0.64
cise
0.63
clusive
0.61
Activations Density 1.120%