INDEX
    Explanations

    auxiliary verbs

    New Auto-Interp
    Negative Logits
     promoters
    -0.07
     Mons
    -0.07
     Noble
    -0.07
    uzzy
    -0.07
     Jill
    -0.07
     Î
    -0.06
    Vict
    -0.06
    -0.06
     shale
    -0.06
     Stories
    -0.06
    POSITIVE LOGITS
    CLU
    0.07
    طرق
    0.07
    (encoder
    0.07
    Invite
    0.07
     artış
    0.07
    Conditional
    0.07
    ,/
    0.07
    0.07
    ными
    0.07
     Москов
    0.07
    Act Density 0.112%

    No Known Activations