INDEX
    Explanations

    Winning or achievement

    New Auto-Interp
    Negative Logits
     Dolphin
    -0.09
    atis
    -0.07
    ists
    -0.07
     Lawyers
    -0.06
     circles
    -0.06
     writing
    -0.06
     sensing
    -0.06
    158
    -0.06
     detecting
    -0.06
    pedo
    -0.06
    POSITIVE LOGITS
     rugged
    0.08
    (view
    0.07
     backstage
    0.07
     shove
    0.07
     sor
    0.07
    .inv
    0.07
    .roles
    0.06
    (Debug
    0.06
     사회
    0.06
    _TRANSFER
    0.06
    Act Density 0.030%

    No Known Activations