INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    OGND
    -0.72
    Book
    -0.64
    béco
    -0.63
    TintMode
    -0.61
    ISupport
    -0.60
     lendemain
    -0.58
     noDo
    -0.58
     battre
    -0.58
     متعلقه
    -0.58
    клопе
    -0.57
    POSITIVE LOGITS
    WEBPACK
    0.51
    expandindo
    0.47
     Ralph
    0.47
     ralph
    0.46
     surla
    0.46
    Ralph
    0.45
     burning
    0.43
    Havolalar
    0.42
     burners
    0.42
     Nere
    0.42
    Act Density 0.004%

    No Known Activations