INDEX
    Explanations

    repeated references to specific subjects or entities, particularly the word "this"

    New Auto-Interp
    Negative Logits
    afx
    -0.16
    entin
    -0.15
    sip
    -0.15
    anko
    -0.14
    ãĥ³ãĥĪ
    -0.14
    iges
    -0.14
    ranÃŃ
    -0.13
    emble
    -0.13
    rax
    -0.13
    rowned
    -0.13
    POSITIVE LOGITS
    itos
    0.16
    agrams
    0.15
    647
    0.15
    kest
    0.15
    ilst
    0.14
    avig
    0.14
    Above
    0.14
    illy
    0.13
    else
    0.13
    ata
    0.13
    Act Density 0.114%

    No Known Activations