INDEX
    Explanations

    structural elements or formatting markers in code-related contexts

    figure and file references

    New Auto-Interp
    Negative Logits
     tillegg
    -0.37
    enumi
    -0.37
     conmigo
    -0.35
     dalších
    -0.32
     arşivlendi
    -0.31
     nên
    -0.30
     also
    -0.29
     prévoit
    -0.29
     další
    -0.28
     derfor
    -0.28
    POSITIVE LOGITS
     ddelwed
    0.75
     Dieſe
    0.72
     好文分享
    0.72
    <unused41>
    0.71
    <unused11>
    0.71
    <unused8>
    0.71
    <unused14>
    0.71
    <unused52>
    0.71
    [@BOS@]
    0.71
    <unused3>
    0.71
    Act Density 0.015%

    No Known Activations