INDEX
    Explanations

    expressions related to the concept of work and its implications

    New Auto-Interp
    Negative Logits
     works
    -0.21
     worked
    -0.18
     Works
    -0.17
    ä½ľåĵģ
    -0.17
    Works
    -0.17
     trabal
    -0.17
    znik
    -0.17
     working
    -0.16
     lavor
    -0.16
     workshop
    -0.15
    POSITIVE LOGITS
     done
    0.28
    streams
    0.24
    done
    0.23
     ethic
    0.23
     Done
    0.23
    horse
    0.23
    ah
    0.23
    Done
    0.22
    loads
    0.22
    shopping
    0.22
    Act Density 0.051%

    No Known Activations