INDEX
    Explanations

    phrases mentioning specific political figures

    references to specific names or entities, particularly those that are significant figures or organizations

    New Auto-Interp
    Negative Logits
    UAL
    -0.78
    ewski
    -0.73
     Gork
    -0.69
    ateurs
    -0.68
    nings
    -0.68
    sing
    -0.68
    rament
    -0.67
    noon
    -0.66
    Required
    -0.64
    IBLE
    -0.64
    POSITIVE LOGITS
     Haram
    1.04
    oro
    1.00
    vernment
    0.97
    NetMessage
    0.90
    oko
    0.88
    lé
    0.87
    annis
    0.83
    issan
    0.76
    wana
    0.75
    heastern
    0.74
    Act Density 0.017%

    No Known Activations