INDEX
    Explanations

    punctuation and question marks in the text

    New Auto-Interp
    Negative Logits
    ordion
    -0.18
    /tos
    -0.17
    onymous
    -0.16
    ìłĪ
    -0.15
    ATAB
    -0.15
    zier
    -0.15
    _Meta
    -0.15
    ajo
    -0.15
    oggle
    -0.14
    lsru
    -0.14
    POSITIVE LOGITS
     Knox
    0.16
     Solo
    0.15
    ificate
    0.14
    acy
    0.14
    atti
    0.14
     Chest
    0.14
     drib
    0.13
    ä¾
    0.13
    602
    0.13
    avar
    0.13
    Act Density 0.005%

    No Known Activations