INDEX
Explanations
mentions of societal power dynamics and shifts
New Auto-Interp
Negative Logits
engo
-0.17
oge
-0.16
še
-0.16
zend
-0.15
ovah
-0.15
(çģ«
-0.15
eyen
-0.14
hea
-0.14
oice
-0.14
òng
-0.14
POSITIVE LOGITS
Lebens
0.15
ÙħÙĨت
0.14
å±ĭ
0.14
756
0.13
ourselves
0.13
describe
0.13
/var
0.12
uki
0.12
.mContext
0.12
ing
0.12
Activations Density 1.148%