INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     irresponsible
    -0.07
    /Gate
    -0.07
     btw
    -0.07
    ecko
    -0.06
     better
    -0.06
     good
    -0.06
    .Messaging
    -0.06
     blo
    -0.06
     verbess
    -0.06
     Trend
    -0.06
    POSITIVE LOGITS
     unanimous
    0.13
     unanimously
    0.11
     unanim
    0.08
    Year
    0.07
    composer
    0.07
    um
    0.07
    วาง
    0.06
     Tops
    0.06
    iterator
    0.06
    study
    0.06
    Act Density 0.001%

    No Known Activations