INDEX
    Explanations

    recurring sequences or patterns in data

    Code snippets and related formatting

    code keywords and symbols

    New Auto-Interp
    Negative Logits
    iſen
    -1.48
    ロウィン
    -1.38
     queſta
    -1.38
    majánló
    -1.37
     ſind
    -1.37
     témoig
    -1.34
    ſchaft
    -1.30
    <unused14>
    -1.30
    <unused8>
    -1.30
    [@BOS@]
    -1.30
    POSITIVE LOGITS
    0
    0.71
    s
    0.64
    hline
    0.62
    1
    0.62
    -
    0.59
    [toxicity=0]
    0.58
    \
    0.57
    ↵↵
    0.57
    9
    0.57
    2
    0.57
    Act Density 0.058%

    No Known Activations