INDEX
Explanations
terms associated with power dynamics and structural influence
New Auto-Interp
Negative Logits
isode
-0.16
æĶ¹
-0.15
thon
-0.15
stk
-0.15
355
-0.14
.uni
-0.14
ighton
-0.14
535
-0.14
ui
-0.14
ceed
-0.14
POSITIVE LOGITS
Paging
0.14
Prayer
0.14
undry
0.14
_via
0.14
æĶ¿
0.13
coli
0.13
ureka
0.13
yyn
0.13
otty
0.13
овÑĸд
0.13
Activations Density 0.214%