INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _agents
    -0.07
    خبر
    -0.07
     Auto
    -0.07
    _comments
    -0.06
    Page
    -0.06
    traction
    -0.06
     OMIT
    -0.06
    λου
    -0.06
    span
    -0.06
    Pok
    -0.06
    POSITIVE LOGITS
     prejud
    0.08
     dwell
    0.07
    +k
    0.07
     оці
    0.07
     paired
    0.06
    ďte
    0.06
    0.06
     eyed
    0.06
     shack
    0.06
     leuk
    0.06
    Act Density 0.031%

    No Known Activations