INDEX
    Explanations

    instances of the word "same."

    New Auto-Interp
    Negative Logits
     own
    -0.15
    rious
    -0.15
    ầm
    -0.14
    untas
    -0.14
    cas
    -0.14
     more
    -0.13
    amburger
    -0.13
     propia
    -0.13
    iesta
    -0.13
    osemite
    -0.13
    POSITIVE LOGITS
    -sex
    0.34
     exact
    0.26
     thing
    0.26
     kind
    0.22
    ãģı
    0.21
     sort
    0.21
    exact
    0.21
     Exact
    0.18
    Exact
    0.18
    -origin
    0.18
    Act Density 0.052%

    No Known Activations