INDEX
    Explanations

    references to physical actions or qualities related to objects

    New Auto-Interp
    Negative Logits
     increa
    -1.67
     emphat
    -1.64
     effe
    -1.58
     encomp
    -1.57
     reluct
    -1.56
     alre
    -1.55
     fta
    -1.53
     suscep
    -1.51
     impra
    -1.51
     accla
    -1.51
    POSITIVE LOGITS
     until
    0.98
     throughout
    0.97
     while
    0.86
     despite
    0.84
     during
    0.84
    until
    0.82
     till
    0.80
    keep
    0.79
     maintained
    0.77
    maintained
    0.77
    Act Density 0.344%

    No Known Activations