INDEX
    Explanations

    the substring "ent" within words

    New Auto-Interp
    Negative Logits
    ucer
    -0.18
    isz
    -0.16
    kar
    -0.15
    esco
    -0.15
    _por
    -0.15
    pon
    -0.14
    uce
    -0.14
    HONE
    -0.14
    orney
    -0.14
     Por
    -0.14
    POSITIVE LOGITS
     Nightmare
    0.15
    ños
    0.15
    andan
    0.15
    istrov
    0.14
    AYOUT
    0.14
     rozh
    0.14
    aleb
    0.14
    ียร
    0.13
     ling
    0.13
     Wand
    0.13
    Act Density 0.000%

    No Known Activations