INDEX
    Explanations

    words indicating significant impact or change

    New Auto-Interp
    Negative Logits
     layer
    -0.15
    orer
    -0.14
    лам
    -0.14
     Clem
    -0.13
    anes
    -0.13
    aries
    -0.13
     Cri
    -0.13
     Mehr
    -0.13
    vik
    -0.13
     attendant
    -0.13
    POSITIVE LOGITS
    emey
    0.18
    ãĥ¼ãĥį
    0.16
    bbc
    0.16
    é¾Ħ
    0.15
    etas
    0.15
    nicos
    0.15
    zos
    0.15
    ìłĪ
    0.14
    Calibri
    0.14
    owie
    0.14
    Act Density 0.001%

    No Known Activations