INDEX
    Explanations

    phrases or terms related to geopolitical conflicts or controversies

    demonstrative or relative pronouns indicating specific entities or groups

    New Auto-Interp
    Negative Logits
    Returns
    -0.73
    Untitled
    -0.72
    laughs
    -0.68
    uces
    -0.62
     increments
    -0.62
    prints
    -0.61
     stays
    -0.61
    Alright
    -0.61
     wheels
    -0.60
    lyss
    -0.60
    POSITIVE LOGITS
     were
    1.15
     include
    1.04
     have
    1.00
     constitute
    0.99
     are
    0.98
     comprise
    0.96
     weren
    0.94
     violate
    0.94
     allege
    0.93
     dominate
    0.89
    Act Density 0.200%

    No Known Activations