INDEX
    Explanations

    descriptions of situations or actions

    phrases relating to control and power dynamics

    New Auto-Interp
    Negative Logits
    âĢ
    -0.88
    20439
    -0.87
     âĢ
    -0.87
    \"
    -0.87
    âĹ
    -0.82
    ãĢ
    -0.82
    âĢIJ
    -0.81
    ¨
    -0.80
    .","
    -0.80
    "},"
    -0.80
    POSITIVE LOGITS
     stuff
    1.14
     goddamn
    1.13
     dudes
    1.12
     kinda
    1.11
     weird
    1.08
     pretty
    1.08
     dude
    1.07
     shit
    1.07
     crap
    1.06
     shitty
    1.06
    Act Density 1.628%

    No Known Activations