INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    cool
    -0.07
    ораль
    -0.06
     Пари
    -0.06
     Photographer
    -0.06
    wjgl
    -0.06
     bootstrap
    -0.06
     сек
    -0.06
     prophets
    -0.06
    amburger
    -0.06
     Weeks
    -0.06
    POSITIVE LOGITS
    (details
    0.08
    alte
    0.07
    0.07
    ovat
    0.06
    Type
    0.06
    0.06
    ателем
    0.06
    ['
    0.06
    ERR
    0.06
    reed
    0.06
    Act Density 0.013%

    No Known Activations