INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     domino
    -0.08
     compr
    -0.08
    ча
    -0.08
     Solomon
    -0.07
    мента
    -0.07
     Munich
    -0.07
     Hana
    -0.07
    _System
    -0.07
     Domino
    -0.07
     complemented
    -0.07
    POSITIVE LOGITS
     discouraged
    0.08
     verboten
    0.08
    は禁止
    0.08
     Verpflicht
    0.08
     ولا
    0.08
     creams
    0.08
     등의
    0.08
     grands
    0.08
    Deprecated
    0.08
     terlalu
    0.08
    Act Density 0.001%

    No Known Activations