INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    elon
    -0.15
     b
    -0.15
    uli
    -0.15
    ikk
    -0.14
     present
    -0.14
    illis
    -0.14
     ep
    -0.14
    hta
    -0.13
    igne
    -0.13
    ICY
    -0.13
    POSITIVE LOGITS
    /stretch
    0.17
    istrovstvÃŃ
    0.15
    ustos
    0.15
    hir
    0.14
    ãĥŃãĥ¼
    0.14
    ehr
    0.14
    ɵ
    0.14
    amus
    0.14
    virt
    0.14
    COPY
    0.14
    Act Density 0.007%

    No Known Activations