INDEX
    Explanations

    expressions of helplessness or lack of agency

    New Auto-Interp
    Negative Logits
    .clipsToBounds
    -0.15
     comport
    -0.14
    sing
    -0.14
    ounge
    -0.14
    ulace
    -0.14
    ugo
    -0.13
    lug
    -0.13
    iola
    -0.13
    WhiteSpace
    -0.13
    ilenames
    -0.13
    POSITIVE LOGITS
     action
    0.28
     steps
    0.28
     nothing
    0.28
    action
    0.26
     Action
    0.25
    -action
    0.25
    ACTION
    0.25
     ACTION
    0.24
     done
    0.24
     Steps
    0.23
    Act Density 0.159%

    No Known Activations