INDEX
    Explanations

    exclamations or questions expressing emotional responses

    question marks

    New Auto-Interp
    Negative Logits
     Diſ
    -0.99
     Houſe
    -0.96
     ―――――
    -0.95
     purpoſe
    -0.92
     مرئيه
    -0.90
     Majefty
    -0.90
     Anſ
    -0.88
     $_"
    -0.88
     depositphotos
    -0.88
     iſt
    -0.87
    POSITIVE LOGITS
    <bos>
    2.36
     the
    1.05
     and
    0.97
    '
    0.88
     of
    0.80
     for
    0.78
     in
    0.77
     is
    0.73
    ↵↵
    0.72
     to
    0.71
    Act Density 0.668%

    No Known Activations