INDEX
Explanations
words related to social issues or societal concerns
references to social issues and concepts
New Auto-Interp
Negative Logits
nces
-0.93
xual
-0.80
20439
-0.71
1001
-0.69
ller
-0.67
ï¸ı
-0.66
Centauri
-0.65
butt
-0.65
zzle
-0.65
Blossom
-0.65
POSITIVE LOGITS
ized
0.92
izing
0.90
istic
0.89
ization
0.83
norms
0.83
democr
0.82
ised
0.82
welfare
0.80
ists
0.80
izes
0.79
Activations Density 0.026%