INDEX
    Explanations

    weight gain

    New Auto-Interp
    Negative Logits
     onward
    -0.07
    -flow
    -0.06
    ovenant
    -0.06
    _Position
    -0.06
    _al
    -0.06
    Binder
    -0.06
     Knight
    -0.06
    Seg
    -0.06
     nejen
    -0.06
     Ок
    -0.06
    POSITIVE LOGITS
     industrial
    0.07
     chronic
    0.07
    0.06
    %)↵
    0.06
     причина
    0.06
     серьез
    0.06
    (chalk
    0.06
    워크
    0.06
     nowadays
    0.06
    (bb
    0.06
    Act Density 0.005%

    No Known Activations