INDEX
    Explanations

    concepts related to dominance and power dynamics

    New Auto-Interp
    Negative Logits
    aci
    -0.16
    ade
    -0.15
     SavaÅŁ
    -0.15
    lich
    -0.14
     tre
    -0.14
    rou
    -0.14
    het
    -0.14
    antee
    -0.14
     inspir
    -0.14
     Mir
    -0.14
    POSITIVE LOGITS
    easy
    0.22
    容æĺĵ
    0.21
     easily
    0.21
     easy
    0.21
     Easily
    0.19
     relatively
    0.19
    Easy
    0.19
    æĺĵ
    0.19
     easiest
    0.18
     fácil
    0.18
    Act Density 0.276%

    No Known Activations