INDEX
Explanations
terms associated with social inequality and political critiques
New Auto-Interp
Negative Logits
standpoint
-0.17
approach
-0.17
/downloads
-0.16
arena
-0.16
attempt
-0.15
realm
-0.14
phere
-0.14
Uvs
-0.14
sphere
-0.14
panse
-0.14
POSITIVE LOGITS
amounts
0.33
levels
0.27
-looking
0.24
versions
0.21
ly
0.20
quantities
0.20
ily
0.20
proportions
0.20
combinations
0.18
levels
0.18
Activations Density 0.528%