INDEX
Explanations
terms related to equity and equality in various contexts
New Auto-Interp
Negative Logits
esian
-0.18
cher
-0.16
rina
-0.15
usercontent
-0.15
anna
-0.15
ãģªãģĦ
-0.15
imore
-0.15
ous
-0.15
essler
-0.15
omen
-0.14
POSITIVE LOGITS
itarian
0.30
izer
0.30
led
0.29
ivalent
0.26
izers
0.25
ities
0.25
izing
0.24
iser
0.24
ivant
0.23
ivent
0.23
Activations Density 0.039%