INDEX
    Explanations

    phrases emphasizing consistency or similarity

    New Auto-Interp
    Negative Logits
     etched
    -0.66
    dale
    -0.63
    -0.60
     spaced
    -0.60
    bard
    -0.59
     breath
    -0.59
     omn
    -0.56
     com
    -0.55
     quoted
    -0.55
     initially
    -0.55
    POSITIVE LOGITS
    same
    0.87
     same
    0.77
    ourke
    0.74
    chwitz
    0.73
     result
    0.72
    conn
    0.70
    ouses
    0.70
    olini
    0.69
     Same
    0.67
    ighed
    0.67
    Act Density 0.162%

    No Known Activations