INDEX
    Explanations

    instances of punctuation marks at the end of sentences

    instances of classification or categorization terminology

    New Auto-Interp
    Negative Logits
    istor
    -0.67
    ÃĥÃĤ
    -0.60
     tradem
    -0.60
    (),
    -0.59
    incarn
    -0.57
     eleph
    -0.56
    ovo
    -0.54
    ÃĥÃĤÃĥÃĤ
    -0.53
     incarn
    -0.53
    ());
    -0.53
    POSITIVE LOGITS
     [
    2.69
    [
    2.04
     ["
    2.01
     [/
    1.94
     [-
    1.90
    [/
    1.81
     ['
    1.77
     [(
    1.71
     []
    1.70
     [*
    1.70
    Act Density 0.140%

    No Known Activations