INDEX
    Explanations

    phrases related to transformation or changing objects into another form

    New Auto-Interp
    Negative Logits
    erity
    -0.75
    inately
    -0.72
    ran
    -0.68
    yright
    -0.68
    ritz
    -0.67
     forbids
    -0.66
    cies
    -0.65
    raint
    -0.64
    no
    -0.64
    enance
    -0.64
    POSITIVE LOGITS
     usable
    0.84
     something
    0.72
    ãĥ¼ãĥ
    0.69
     a
    0.68
     fodder
    0.68
     ashes
    0.67
     surrogate
    0.67
     an
    0.65
     profitable
    0.62
    quished
    0.61
    Act Density 0.052%

    No Known Activations