INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    at
    0.81
    0.76
    is
    0.71
    kk
    0.71
    bird
    0.70
     Impulse
    0.70
    splash
    0.69
    ll
    0.69
    cul
    0.69
    el
    0.68
    POSITIVE LOGITS
    -
    1.24
    -...
    1.01
    -'
    0.95
    -$\
    0.88
    -*
    0.88
    -)
    0.86
     мира
    0.84
    -]
    0.84
    -【
    0.82
    -'+
    0.80
    Act Density 0.001%

    No Known Activations