INDEX
    Explanations

    repetitive phrases indicating similarity or comparison

    expressions and phrases that indicate repetition or similarity

    New Auto-Interp
    Negative Logits
    front
    -0.75
    dash
    -0.72
    ãĥĥ
    -0.71
    rend
    -0.70
    their
    -0.70
     Helpful
    -0.68
    rection
    -0.68
    ENDED
    -0.68
    replace
    -0.68
    orse
    -0.67
    POSITIVE LOGITS
     applies
    1.33
     goes
    1.22
     thing
    1.15
     happens
    1.07
     holds
    1.01
     principle
    0.99
     cannot
    0.99
     principles
    0.97
     happened
    0.96
     fate
    0.93
    Act Density 0.043%

    No Known Activations