INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     unidad
    -0.07
    Ana
    -0.07
    ghost
    -0.07
    Severity
    -0.06
    WithPath
    -0.06
    dere
    -0.06
     manifest
    -0.06
    пример
    -0.06
    overy
    -0.06
     Exhib
    -0.06
    POSITIVE LOGITS
    0.07
     beam
    0.06
    0.06
     Majority
    0.06
    0.06
    лся
    0.06
    cb
    0.06
     बच
    0.06
    ami
    0.06
     assure
    0.06
    Act Density 0.003%

    No Known Activations