INDEX
    Explanations

    words related to efficiency or efforts

    words indicating effort or effectiveness

    New Auto-Interp
    Negative Logits
     oath
    -0.66
    antha
    -0.66
    lords
    -0.64
     smartphones
    -0.64
    creen
    -0.63
     rake
    -0.62
     Wolves
    -0.62
    BOOK
    -0.62
    seed
    -0.62
    ²¾
    -0.62
    POSITIVE LOGITS
    luent
    1.25
    endi
    1.23
    encing
    1.18
    emin
    1.15
    iency
    1.11
    ence
    1.05
    usions
    1.01
    orts
    1.00
    erves
    0.99
    usion
    0.99
    Act Density 0.047%

    No Known Activations