INDEX
    Explanations

    repetitive or frequently mentioned nouns

    New Auto-Interp
    Negative Logits
    ned
    -0.20
    igger
    -0.15
    etto
    -0.15
    avou
    -0.15
    ernet
    -0.15
    bote
    -0.14
    gan
    -0.14
    inka
    -0.14
    ess
    -0.14
    mund
    -0.14
    POSITIVE LOGITS
    uded
    0.17
    sse
    0.15
    ëłĩ
    0.15
    ws
    0.15
    ISON
    0.15
    wig
    0.14
    лÑı
    0.14
    ison
    0.14
    atk
    0.14
    ToProps
    0.13
    Act Density 0.034%

    No Known Activations