INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     treated
    -0.07
     Lent
    -0.07
     setBackground
    -0.07
    ็นอ
    -0.07
    ПО
    -0.06
    ((&
    -0.06
     causal
    -0.06
     sendData
    -0.06
    ニニ
    -0.06
     гип
    -0.06
    POSITIVE LOGITS
     illusions
    0.07
     trump
    0.06
     Carly
    0.06
    0.06
     меня
    0.06
     initials
    0.06
     Roulette
    0.06
    sty
    0.06
     cheerful
    0.06
    arial
    0.06
    Act Density 0.002%

    No Known Activations