INDEX
    Explanations

    the beginning of text or segments in a document

    New Auto-Interp
    Negative Logits
     Whilst
    -0.99
    AndEndTag
    -0.95
     متعلقه
    -0.95
    \{\\
    -0.94
     anyways
    -0.94
    Whilst
    -0.91
     imágen
    -0.88
    +#+#
    -0.86
     CreateTagHelper
    -0.84
     Anyways
    -0.83
    POSITIVE LOGITS
    0.65
     —
    0.60
    0.56
    .—
    0.56
    0.52
    ↵↵
    0.51
    0.51
    ¦
    0.50
     said
    0.49
     ¦
    0.46
    Act Density 0.136%

    No Known Activations