INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     worse
    -0.08
     hans
    -0.07
     comes
    -0.07
     none
    -0.07
     born
    -0.07
     ease
    -0.07
     estar
    -0.07
     map
    -0.07
     Praze
    -0.07
     more
    -0.07
    POSITIVE LOGITS
     utilized
    0.07
    Collector
    0.07
     Lotus
    0.07
     citiz
    0.07
    itizer
    0.07
    hydration
    0.07
     Jewelry
    0.07
     utiliz
    0.07
     utilis
    0.07
    ��
    0.07
    Act Density 0.020%

    No Known Activations