INDEX
    Explanations

    punctuation marks and common programmatic formatting symbols

    New Auto-Interp
    Negative Logits
     Gent
    -0.15
    verts
    -0.14
     carving
    -0.14
     Ward
    -0.14
    eller
    -0.14
     Carol
    -0.14
     judge
    -0.14
     infl
    -0.14
     laugh
    -0.13
     sinon
    -0.13
    POSITIVE LOGITS
    format
    0.40
    .format
    0.35
     format
    0.34
    -format
    0.32
     Format
    0.31
     formats
    0.28
    æł¼å¼ı
    0.28
    .Format
    0.27
    Format
    0.26
    _format
    0.25
    Act Density 0.006%

    No Known Activations