INDEX
    Explanations

    contractions of "get it" with high activation values

    instances of the word "it"

    New Auto-Interp
    Negative Logits
    notice
    -0.72
     Mans
    -0.71
    ILE
    -0.65
    ãĤ±
    -0.64
    ãĥ´ãĤ¡
    -0.64
    911
    -0.63
    767
    -0.59
    VA
    -0.59
     Friend
    -0.59
    762
    -0.58
    POSITIVE LOGITS
    chy
    1.23
    alian
    0.86
    unes
    0.86
    iner
    0.81
    atic
    0.69
     backwards
    0.67
    atically
    0.67
    geist
    0.67
    asca
    0.67
    ueller
    0.66
    Act Density 0.079%

    No Known Activations