INDEX
    Explanations

    phrases related to legal issues and safety concerns

    punctuated phrases indicating lists or multiple ideas

    New Auto-Interp
    Negative Logits
    ophon
    -0.76
    abouts
    -0.74
    ibility
    -0.72
    Orig
    -0.72
    imb
    -0.67
    utral
    -0.66
    DX
    -0.66
    é¾
    -0.65
    Availability
    -0.64
    MQ
    -0.63
    POSITIVE LOGITS
     lest
    1.17
     forgetting
    1.00
     eh
    0.98
     thereby
    0.94
     huh
    0.93
     or
    0.92
     ruining
    0.92
     ignoring
    0.89
     knowing
    0.88
     ignores
    0.88
    Act Density 0.419%

    No Known Activations