INDEX
    Explanations

    concepts related to influence and power dynamics

    New Auto-Interp
    Negative Logits
    vek
    -0.16
    -of
    -0.16
    ken
    -0.15
    eln
    -0.15
    uegos
    -0.15
    ven
    -0.14
    onth
    -0.14
    -than
    -0.14
     ander
    -0.14
    posium
    -0.14
    POSITIVE LOGITS
    icky
    0.18
    ¯
    0.15
     Bentley
    0.14
    ibold
    0.14
     Tro
    0.14
    RunWith
    0.13
     ÑĢоÑģ
    0.13
    itm
    0.13
    inline
    0.13
     èģ
    0.13
    Act Density 0.548%

    No Known Activations