INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cameo
    -0.08
    inct
    -0.08
     craz
    -0.08
     Malayalam
    -0.08
     rind
    -0.07
     sightseeing
    -0.07
    ющ
    -0.07
     Makeup
    -0.07
     wych
    -0.07
     Beach
    -0.07
    POSITIVE LOGITS
     foundation
    0.08
     offerings
    0.08
    388
    0.08
    /mac
    0.07
    0.07
    /ST
    0.07
     Teg
    0.07
     prototype
    0.07
     moj
    0.07
    /mm
    0.07
    Act Density 0.003%

    No Known Activations