INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     guilt
    -0.09
    .alt
    -0.08
    Stud
    -0.07
     alt
    -0.07
     gear
    -0.07
    Bundles
    -0.07
     studying
    -0.07
     menghad
    -0.07
     discrepancies
    -0.07
     datetime
    -0.07
    POSITIVE LOGITS
    ifar
    0.08
     Philosoph
    0.08
     byg
    0.07
    antro
    0.07
     nhau
    0.07
    ingers
    0.07
     другу
    0.07
     someone
    0.07
     WM
    0.07
    ിർ
    0.07
    Act Density 0.034%

    No Known Activations