INDEX
Explanations
words related to societal issues and constructs
references to social issues and inequalities
New Auto-Interp
Negative Logits
nces
-0.96
urat
-0.75
20439
-0.75
Ridge
-0.72
butt
-0.69
peed
-0.68
Blossom
-0.68
ucket
-0.67
ilts
-0.66
ved
-0.65
POSITIVE LOGITS
izing
1.05
ization
1.03
ized
0.98
istic
0.96
norms
0.96
cohesion
0.89
ize
0.88
ising
0.87
sciences
0.87
isation
0.86
Activations Density 0.029%