INDEX
    Explanations

    punctuation marks and formatting elements in written text

    New Auto-Interp
    Negative Logits
    .dm
    -0.15
    DMI
    -0.15
    agh
    -0.14
    ?q
    -0.14
    osate
    -0.14
    NOWLED
    -0.14
    isty
    -0.14
    /session
    -0.14
    ologne
    -0.14
    UCT
    -0.13
    POSITIVE LOGITS
    uchos
    0.16
     Z
    0.15
    akov
    0.15
    _internal
    0.15
     Div
    0.15
     ant
    0.14
    lick
    0.14
     Daniels
    0.14
     div
    0.14
    alus
    0.14
    Act Density 0.001%

    No Known Activations