INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     бө
    -0.09
    builtin
    -0.08
     bott
    -0.08
     paragraphs
    -0.08
    _encode
    -0.07
     clarified
    -0.07
     rea
    -0.07
    Builtin
    -0.07
     recommending
    -0.07
    bl
    -0.07
    POSITIVE LOGITS
     criminals
    0.09
     frowned
    0.08
     delinc
    0.08
     crimin
    0.08
    щиков
    0.08
     savvy
    0.08
     antics
    0.08
     наруш
    0.08
    LOGY
    0.07
     caric
    0.07
    Act Density 0.012%

    No Known Activations