INDEX
    Explanations

    the word "lock" or words related to security or control

    New Auto-Interp
    Negative Logits
    issance
    -0.87
    LV
    -0.78
    schild
    -0.73
    ãĤ¡
    -0.71
    enegger
    -0.70
    abama
    -0.69
    ilater
    -0.68
    olf
    -0.66
    xual
    -0.66
    resso
    -0.66
    POSITIVE LOGITS
    picking
    1.25
    heed
    1.17
    pick
    1.09
    creen
    1.06
    door
    1.02
    step
    0.96
    lear
    0.94
    hold
    0.92
    downs
    0.91
     horns
    0.87
    Act Density 0.028%

    No Known Activations