INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (orig
    -0.07
     swimming
    -0.06
     NaN
    -0.06
     AMC
    -0.06
    _matches
    -0.06
     dripping
    -0.06
     декабря
    -0.06
     newborn
    -0.06
     sums
    -0.06
    -0.06
    POSITIVE LOGITS
     sexuales
    0.07
     playerId
    0.07
    -wow
    0.06
    adora
    0.06
     accidentally
    0.06
     Feinstein
    0.06
    -N
    0.06
    _INT
    0.06
     teşekkür
    0.06
    ratulations
    0.06
    Act Density 0.016%

    No Known Activations