INDEX
Explanations
words related to social themes and societal issues
topics related to social issues and inequalities
New Auto-Interp
Negative Logits
nces
-1.02
20439
-0.94
urat
-0.75
ucket
-0.74
ilts
-0.68
ovie
-0.67
mary
-0.67
Ridge
-0.67
many
-0.67
lists
-0.66
POSITIVE LOGITS
ization
1.11
izing
1.09
ized
1.03
cohesion
1.00
ize
0.98
justice
0.93
ising
0.93
ised
0.92
norms
0.92
isation
0.91
Activations Density 0.032%