INDEX
Explanations
gent-based words related to cleaning products or processes
words related to diversity and differing perspectives
New Auto-Interp
Negative Logits
©¶æ
-0.83
WT
-0.76
uncture
-0.72
ecause
-0.71
igslist
-0.69
nington
-0.68
ggies
-0.68
ppo
-0.67
ned
-0.67
ivia
-0.66
POSITIVE LOGITS
gent
1.04
rants
0.84
ente
0.82
rant
0.80
rification
0.76
hal
0.76
leness
0.75
encies
0.75
ial
0.73
ran
0.70
Activations Density 0.008%