INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     initialized
    -0.07
    .Owner
    -0.07
     whip
    -0.07
     Members
    -0.07
     converge
    -0.06
    Calcul
    -0.06
     Dak
    -0.06
     hobby
    -0.06
    Control
    -0.06
     Likes
    -0.06
    POSITIVE LOGITS
     самой
    0.07
    cies
    0.07
    okrat
    0.07
     звичай
    0.07
     souvis
    0.06
    ção
    0.06
    oland
    0.06
     headset
    0.06
    0.06
     anmeld
    0.06
    Act Density 0.018%

    No Known Activations