INDEX
    Explanations

    phrases related to holding onto beliefs, power, responsibilities, or physical objects

    New Auto-Interp
    Negative Logits
    ————
    -0.70
    DIT
    -0.66
    nown
    -0.66
    ãĤ¡
    -0.65
    ixel
    -0.64
    gnu
    -0.64
    ibel
    -0.64
    ghan
    -0.64
    ettel
    -0.64
    endix
    -0.63
    POSITIVE LOGITS
    hold
    1.09
     sway
    1.04
    erness
    0.98
     accountable
    0.97
    holding
    0.97
    holders
    0.95
     hostage
    0.92
     captive
    0.88
    holder
    0.88
     onto
    0.87
    Act Density 0.491%

    No Known Activations