INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    232
    -0.06
     endorsing
    -0.06
     poids
    -0.06
     угл
    -0.06
    Cleaning
    -0.06
    atisch
    -0.06
    lambda
    -0.06
    /contact
    -0.06
     bottleneck
    -0.06
    icaret
    -0.05
    POSITIVE LOGITS
     reports
    0.12
     report
    0.09
     reportedly
    0.08
     Reports
    0.08
     rooft
    0.07
    /animate
    0.07
    _WITH
    0.07
     overpower
    0.07
     SOURCE
    0.06
     připoj
    0.06
    Act Density 0.030%

    No Known Activations