INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     кня
    -0.08
    _BOOK
    -0.07
     اولین
    -0.07
     Sci
    -0.07
    +$
    -0.06
    undle
    -0.06
    write
    -0.06
    <Type
    -0.06
     Make
    -0.06
    encode
    -0.06
    POSITIVE LOGITS
     sarc
    0.07
     BAR
    0.07
     bar
    0.07
    0.07
    bar
    0.07
    -bar
    0.07
     bars
    0.06
     conforms
    0.06
    parser
    0.06
     filters
    0.06
    Act Density 0.005%

    No Known Activations