INDEX
    Explanations

    phrases indicating similarity or identicality

    instances of the word "same."

    New Auto-Interp
    Negative Logits
    ases
    -0.80
    rosso
    -0.69
    WI
    -0.69
    *=-
    -0.69
    arest
    -0.68
    rection
    -0.67
    rique
    -0.67
    xtap
    -0.66
    efully
    -0.66
    meet
    -0.64
    POSITIVE LOGITS
     thing
    1.23
     way
    0.92
     amount
    0.88
     exact
    0.88
     old
    0.85
     kind
    0.84
     damn
    0.83
     ol
    0.82
     size
    0.81
     sized
    0.80
    Act Density 0.042%

    No Known Activations