INDEX
    Explanations

    elements of formatting and structure in text

    New Auto-Interp
    Negative Logits
     Efq
    -1.00
    NUMX
    -0.99
     Jefus
    -0.96
     pleaſure
    -0.94
     Theſe
    -0.94
    ſelf
    -0.93
    ſelves
    -0.93
    出版年
    -0.92
     itſelf
    -0.92
    ConstraintMaker
    -0.92
    POSITIVE LOGITS
    <eos>
    0.96
    0.91
    ↵↵
    0.77
    ↵↵↵
    0.68
    ↵↵↵↵
    0.63
    ↵↵↵↵↵
    0.52
    .
    0.45
    :
    0.45
    ud
    0.44
    ↵↵↵↵↵↵
    0.43
    Act Density 0.803%

    No Known Activations