INDEX
    Explanations

    instances of the word "original" or related terms

    New Auto-Interp
    Negative Logits
     mere
    -0.15
    untu
    -0.15
    &W
    -0.15
    mere
    -0.15
    cken
    -0.15
     Wiley
    -0.14
    hle
    -0.14
     ranks
    -0.14
    ow
    -0.14
    arta
    -0.14
    POSITIVE LOGITS
    /original
    0.23
    ity
    0.19
    mente
    0.18
    undos
    0.17
    -fashioned
    0.17
    ities
    0.17
    ised
    0.16
    .Formatter
    0.15
    arily
    0.15
    -original
    0.15
    Act Density 0.027%

    No Known Activations