INDEX
    Explanations

    references to planning and organization-related concepts

    New Auto-Interp
    Negative Logits
     Represents
    -0.16
    comm
    -0.15
    ras
    -0.15
    ira
    -0.14
    ender
    -0.14
     Origin
    -0.14
    igroup
    -0.14
    oke
    -0.14
    vide
    -0.13
    zh
    -0.13
    POSITIVE LOGITS
     lies
    0.32
     lie
    0.27
     besides
    0.26
    lies
    0.25
     include
    0.24
     is
    0.24
     Lies
    0.23
     lied
    0.22
     involve
    0.21
     Lie
    0.21
    Act Density 0.162%

    No Known Activations