INDEX
    Explanations

    references to scientific measurements or metrics

    lines, separators, and symbols

    New Auto-Interp
    Negative Logits
    iſen
    -0.88
     ſich
    -0.85
     ainfi
    -0.84
     Geſ
    -0.82
    帖最后由
    -0.81
     createSprite
    -0.81
     zoude
    -0.78
     ſein
    -0.77
    majánló
    -0.77
     ſei
    -0.77
    POSITIVE LOGITS
    1
    0.47
    0
    0.44
    <blockquote>
    0.44
    2
    0.40
    B
    0.39
    5
    0.39
    Y
    0.39
    As
    0.39
    9
    0.39
    [toxicity=0]
    0.39
    Act Density 0.003%

    No Known Activations