INDEX
    Explanations

    punctuation and symbols indicating structure or emphasis in the text

    New Auto-Interp
    Negative Logits
    .metamodel
    -0.14
    ãĤ¤ãĤ¯
    -0.14
    ÙħاÙĦ
    -0.14
    ά
    -0.14
    ĶåĽŀ
    -0.13
     Aure
    -0.13
    +A
    -0.13
     bidi
    -0.13
    ÑĮе
    -0.13
    Ĥ¹
    -0.13
    POSITIVE LOGITS
    992
    0.17
     Fol
    0.17
     yesterday
    0.17
    stery
    0.16
     Yesterday
    0.15
     IDirect
    0.15
    esterday
    0.15
    defaults
    0.14
    ollow
    0.14
    Yesterday
    0.14
    Act Density 0.049%

    No Known Activations