INDEX
    Explanations

    expressions of contradictions or unexpected outcomes in societal conditions

    New Auto-Interp
    Negative Logits
    ombat
    -0.17
    acomment
    -0.15
    udson
    -0.14
    olkien
    -0.14
    ptal
    -0.14
    ince
    -0.14
    ureau
    -0.14
    utton
    -0.14
    imbus
    -0.14
    edicine
    -0.14
    POSITIVE LOGITS
    arak
    0.18
    noch
    0.17
     STILL
    0.15
     still
    0.15
     Still
    0.15
    still
    0.15
     Ox
    0.15
    plx
    0.14
    Still
    0.14
    cast
    0.14
    Act Density 0.286%

    No Known Activations