INDEX
    Explanations

    phrases that emphasize the significance of particular subjects or concepts

    New Auto-Interp
    Negative Logits
    orman
    -0.15
    amac
    -0.15
    ungs
    -0.15
    ål
    -0.14
    HX
    -0.14
     somewhat
    -0.14
    rance
    -0.14
    оÑħ
    -0.14
     Howard
    -0.14
    íķij
    -0.14
    POSITIVE LOGITS
     thing
    0.24
    thing
    0.21
     things
    0.17
    like
    0.17
    aklı
    0.16
    likle
    0.16
    apy
    0.16
     coisa
    0.16
     cosas
    0.15
    ething
    0.15
    Act Density 0.044%

    No Known Activations