INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    efeller
    -0.07
     Graves
    -0.06
     presente
    -0.06
     refuge
    -0.06
    ************************************************************************
    -0.06
     undoubtedly
    -0.06
    件事
    -0.06
     aire
    -0.06
    iance
    -0.06
     Foam
    -0.06
    POSITIVE LOGITS
     woo
    0.07
     можна
    0.06
    ATORS
    0.06
     subscribers
    0.06
    Coming
    0.06
    EqualTo
    0.06
     balcony
    0.06
     ger
    0.06
    生成
    0.06
     gymn
    0.06
    Act Density 0.024%

    No Known Activations