INDEX
    Explanations

    punctuation marks and their patterns

    New Auto-Interp
    Negative Logits
    forme
    -0.15
    isContained
    -0.14
    nze
    -0.14
    97
    -0.14
    ţi
    -0.13
     lr
    -0.13
    иÑĤом
    -0.13
    ture
    -0.13
    ibar
    -0.13
    ipzig
    -0.13
    POSITIVE LOGITS
    201
    0.30
    200
    0.28
    202
    0.24
    199
    0.22
    late
    0.21
    198
    0.19
    197
    0.18
    000
    0.18
     late
    0.17
    mid
    0.16
    Act Density 0.024%

    No Known Activations