INDEX
Explanations
words related to social issues
references to social issues and inequalities
New Auto-Interp
Negative Logits
nces
-0.92
20439
-0.80
urat
-0.76
ilts
-0.74
Ridge
-0.73
Blossom
-0.66
ucket
-0.65
butt
-0.65
tto
-0.64
sterdam
-0.64
POSITIVE LOGITS
izing
1.02
ization
1.01
cohesion
0.95
ized
0.95
istic
0.94
norms
0.93
interaction
0.91
networking
0.91
ize
0.90
sciences
0.89
Activations Density 0.025%