INDEX
    Explanations

    phrases related to conflicts, political events, and professional backgrounds

    New Auto-Interp
    Negative Logits
    theless
    -0.66
     interchange
    -0.61
     orally
    -0.60
     infringing
    -0.54
     uphill
    -0.54
     redacted
    -0.53
    LESS
    -0.53
     soluble
    -0.53
     Rabbit
    -0.51
     typo
    -0.51
    POSITIVE LOGITS
    ctions
    1.06
    ices
    0.97
    uments
    0.96
    ations
    0.95
    sts
    0.95
    itions
    0.95
    asures
    0.94
    ences
    0.92
    gments
    0.90
    ances
    0.88
    Act Density 0.478%

    No Known Activations