INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     synonymous
    -0.07
     trivia
    -0.07
    -0.07
     generic
    -0.07
     Vulner
    -0.07
     countries
    -0.07
     cinéma
    -0.07
    ما
    -0.07
    刚好
    -0.06
     specific
    -0.06
    POSITIVE LOGITS
     DOUBLE
    0.08
    ccc
    0.07
    .schedule
    0.07
    /******/↵
    0.07
    /callback
    0.07
    esis
    0.07
     '\\'
    0.06
    CKET
    0.06
     fucks
    0.06
    etus
    0.06
    Act Density 0.001%

    No Known Activations