INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     declining
    -0.07
    che
    -0.06
     submitting
    -0.06
    _design
    -0.06
    750
    -0.06
    -0.06
     trustees
    -0.06
     voter
    -0.06
    _ar
    -0.06
    _ajax
    -0.06
    POSITIVE LOGITS
     которым
    0.07
    Installed
    0.07
     LB
    0.07
     Anything
    0.06
     Cliff
    0.06
     Clash
    0.06
     giải
    0.06
     countered
    0.06
    .paths
    0.06
    edad
    0.06
    Act Density 0.015%

    No Known Activations