INDEX
Explanations
the word "Anyone" and its variations, indicating a search for inclusive language
New Auto-Interp
Negative Logits
iger
-0.67
urations
-0.65
Hound
-0.65
bows
-0.65
Resurrection
-0.63
rament
-0.63
irth
-0.63
ocamp
-0.63
tnc
-0.63
ories
-0.62
POSITIVE LOGITS
else
1.76
else
1.33
Else
1.24
Else
1.21
who
1.02
wishing
0.96
who
0.95
subscribed
0.93
interested
0.93
soever
0.89
Activations Density 0.033%