INDEX
    Explanations

    mentions of societal power dynamics and shifts

    New Auto-Interp
    Negative Logits
    engo
    -0.17
    oge
    -0.16
    še
    -0.16
    zend
    -0.15
    ovah
    -0.15
    (çģ«
    -0.15
    eyen
    -0.14
    hea
    -0.14
    oice
    -0.14
    òng
    -0.14
    POSITIVE LOGITS
     Lebens
    0.15
    ÙħÙĨت
    0.14
    å±ĭ
    0.14
    756
    0.13
     ourselves
    0.13
    	describe
    0.13
    /var
    0.12
    uki
    0.12
    .mContext
    0.12
    ing
    0.12
    Act Density 1.148%

    No Known Activations