INDEX
Explanations
concepts related to power dynamics and representation in scholarship and discourse
New Auto-Interp
Negative Logits
onz
-0.17
prov
-0.15
inet
-0.15
onda
-0.15
agy
-0.14
ocale
-0.14
eki
-0.14
theme
-0.14
allery
-0.14
rame
-0.14
POSITIVE LOGITS
disc
0.19
Scalar
0.17
spaces
0.16
Scalar
0.15
noc
0.15
UsersController
0.15
uffs
0.15
ographies
0.15
spaces
0.15
scalar
0.14
Activations Density 0.013%