INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     contradictory
    -0.07
     bogus
    -0.07
     Drum
    -0.06
    (graph
    -0.06
     constantly
    -0.06
    ({},
    -0.06
     foes
    -0.06
     additionally
    -0.06
    /n
    -0.06
    /de
    -0.06
    POSITIVE LOGITS
    .maven
    0.15
     JJ
    0.08
    _GP
    0.07
    出版社
    0.07
     Ngb
    0.07
    )))));↵
    0.07
    'ят
    0.06
     Therm
    0.06
    .cs
    0.06
     evasion
    0.06
    Act Density 0.002%

    No Known Activations