INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     substantive
    -0.06
     Zur
    -0.06
     Advisors
    -0.06
    -0.06
    (del
    -0.06
    инов
    -0.06
     Sark
    -0.06
     노하우
    -0.06
     Marino
    -0.06
     dziew
    -0.06
    POSITIVE LOGITS
    utral
    0.06
    0.06
    0.06
    JM
    0.06
    isted
    0.06
    dent
    0.06
    latent
    0.06
     organiz
    0.06
    ufac
    0.06
     رئیس
    0.06
    Act Density 0.002%

    No Known Activations