INDEX
    Explanations

    punctuation

    New Auto-Interp
    Negative Logits
    13
    -0.08
    King
    -0.07
    وع
    -0.07
    Few
    -0.07
    14
    -0.06
    16
    -0.06
    _year
    -0.06
    AST
    -0.06
     ref
    -0.06
    жд
    -0.06
    POSITIVE LOGITS
     breve
    0.07
     Literal
    0.06
     zpracování
    0.06
     Async
    0.06
     бой
    0.06
    bye
    0.06
     toxicity
    0.06
    -->↵
    0.06
     nebo
    0.06
     Bray
    0.06
    Act Density 0.009%

    No Known Activations