INDEX
    Explanations

    phrases that indicate signs of change or shifts in societal or political dynamics

    New Auto-Interp
    Negative Logits
    egrity
    -0.16
    .opens
    -0.15
     silence
    -0.14
    ãģĤãģĴ
    -0.14
    ãģªãĤĭ
    -0.14
    cour
    -0.14
     Silence
    -0.13
    _Detail
    -0.13
    Ïĥια
    -0.13
    82
    -0.13
    POSITIVE LOGITS
     how
    0.26
     Ñģобой
    0.21
    omething
    0.20
     something
    0.20
     intent
    0.20
     why
    0.18
     intents
    0.17
    how
    0.17
    ä¸Ģç§į
    0.17
     where
    0.16
    Act Density 0.300%

    No Known Activations