INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Punkten
    -0.08
     alt
    -0.07
     telev
    -0.07
     religiosa
    -0.07
     Princess
    -0.07
    -0.07
     faithfully
    -0.07
    halts
    -0.07
     religious
    -0.07
     Patricia
    -0.07
    POSITIVE LOGITS
    _coef
    0.08
     نار
    0.08
    ΑΣ
    0.08
     ай
    0.08
    -central
    0.08
     coefficient
    0.08
     Уже
    0.08
    ื่น
    0.08
    asẹ
    0.08
    _modified
    0.07
    Act Density 0.014%

    No Known Activations