INDEX
    Explanations

    comparisons indicating improvement or superiority

    instances of the word "better."

    New Auto-Interp
    Negative Logits
    iasco
    -0.70
    lies
    -0.69
     Contents
    -0.63
    endant
    -0.61
    ital
    -0.60
    lig
    -0.60
    opsis
    -0.60
    aline
    -0.59
    achusetts
    -0.59
    achi
    -0.59
    POSITIVE LOGITS
     better
    3.42
    better
    2.85
     Better
    2.07
     worse
    2.06
    Better
    2.06
     nicer
    2.05
     smarter
    1.95
     safer
    1.94
     stronger
    1.81
     wiser
    1.76
    Act Density 0.032%

    No Known Activations