INDEX
    Explanations

    phrases indicating transformation or change

    New Auto-Interp
    Negative Logits
    yy
    -0.16
    izz
    -0.15
    iry
    -0.15
    reece
    -0.14
    ves
    -0.14
    esi
    -0.13
    .normalized
    -0.13
    ses
    -0.13
    omp
    -0.13
    quo
    -0.13
    POSITIVE LOGITS
    ucz
    0.17
     part
    0.16
    ildo
    0.16
    ocard
    0.14
    -translate
    0.14
    ieve
    0.14
    Alle
    0.13
    azzi
    0.13
    etus
    0.13
    stride
    0.13
    Act Density 0.065%

    No Known Activations