INDEX
Explanations
phrases related to social issues and activism
references to social issues and social structures
New Auto-Interp
Negative Logits
nces
-0.96
20439
-0.75
urat
-0.72
ilts
-0.71
1001
-0.70
Ridge
-0.70
ARDS
-0.69
peed
-0.67
Blazing
-0.66
zzle
-0.66
POSITIVE LOGITS
izing
0.99
ized
0.97
istic
0.96
ization
0.95
cohesion
0.89
norms
0.88
democr
0.85
welfare
0.84
sciences
0.82
interaction
0.82
Activations Density 0.029%