INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    _office
    -0.07
     레벨
    -0.07
    -0.07
    face
    -0.07
     trophies
    -0.07
     Kir
    -0.07
    Climate
    -0.07
    ocaust
    -0.07
    Forecast
    -0.06
    POSITIVE LOGITS
    pr
    0.09
    Pr
    0.09
    0.07
     Pr
    0.07
    prim
    0.07
    0.07
    -picker
    0.07
     пож
    0.06
    при
    0.06
     Khan
    0.06
    Act Density 0.116%

    No Known Activations