INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cron
    -0.06
    RITE
    -0.06
     soir
    -0.06
     unbelie
    -0.06
     würde
    -0.06
    -0.06
     hey
    -0.06
     had
    -0.06
     Toro
    -0.06
    [now
    -0.06
    POSITIVE LOGITS
    aph
    0.07
     дів
    0.07
    _uint
    0.07
    Players
    0.07
    .faces
    0.07
     assisting
    0.07
    issues
    0.07
    _general
    0.06
    options
    0.06
     motions
    0.06
    Act Density 0.035%

    No Known Activations