INDEX
    Explanations

    phrases indicating a speaker's self-reference or directives

    New Auto-Interp
    Negative Logits
    ivet
    -0.17
    apult
    -0.15
    _hal
    -0.15
    ICENSE
    -0.14
     GOODMAN
    -0.14
    TEMPL
    -0.14
    ablo
    -0.14
    ÅĻiv
    -0.14
    568
    -0.14
    IGNAL
    -0.13
    POSITIVE LOGITS
     clearing
    0.15
     stip
    0.15
    _simps
    0.15
    ecta
    0.14
     confession
    0.14
     ultra
    0.14
     klar
    0.14
     Haz
    0.14
    ECT
    0.14
     Mis
    0.14
    Act Density 0.085%

    No Known Activations