INDEX
    Explanations

    terms related to societal and political issues, particularly focusing on justice, reform, and systemic challenges

    New Auto-Interp
    Negative Logits
     Brow
    -0.58
     Stab
    -0.56
    laugh
    -0.55
    cise
    -0.55
    hin
    -0.54
     Honour
    -0.54
     Saying
    -0.53
     Coral
    -0.52
    ullivan
    -0.52
     Thanksgiving
    -0.52
    POSITIVE LOGITS
     exists
    1.14
     improves
    1.07
     dominates
    1.03
     tends
    1.02
     occurs
    1.01
     persists
    1.00
     requires
    0.99
     isn
    0.98
     involves
    0.98
     entails
    0.98
    Act Density 0.176%

    No Known Activations