INDEX
    Explanations

    API-related calls and error messages

    New Auto-Interp
    Negative Logits
     safety
    -1.58
     career
    -1.47
    oken
    -1.36
     Safety
    -1.36
    fulness
    -1.36
    iquit
    -1.29
    iels
    -1.28
     jeopardy
    -1.28
     immunity
    -1.28
     realism
    -1.27
    POSITIVE LOGITS
    @
    1.84
    gets
    1.68
    #
    1.60
    #,
    1.56
    illary
    1.56
    cott
    1.50
    itte
    1.50
     brains
    1.43
    ubert
    1.42
    lette
    1.42
    Act Density 4.325%

    No Known Activations