INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     некотор
    -0.07
    693
    -0.07
     теж
    -0.06
    ्दर
    -0.06
    -te
    -0.06
     zijn
    -0.06
     door
    -0.06
    ITO
    -0.06
     transporte
    -0.06
    пион
    -0.06
    POSITIVE LOGITS
     computer
    0.08
    .ba
    0.08
     computers
    0.07
    /assets
    0.07
     Level
    0.06
     attrs
    0.06
     Mits
    0.06
     дити
    0.06
    -catching
    0.06
    /scripts
    0.06
    Act Density 0.025%

    No Known Activations