INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     wore
    -1.26
     wearing
    -1.20
     wears
    -1.20
     wear
    -1.19
     Wearing
    -1.16
    Wear
    -1.12
     Wear
    -1.10
    wearing
    -1.10
    wear
    -1.05
    Wearing
    -1.05
    POSITIVE LOGITS
    able
    0.59
    out
    0.58
    ted
    0.51
    ing
    0.50
    o
    0.50
    ed
    0.49
    ese
    0.49
    +][
    0.48
    bacher
    0.48
    u
    0.48
    Act Density 0.084%

    No Known Activations