INDEX
    Explanations

    mentions of political figures and events in the context of government policies and international relations

    New Auto-Interp
    Negative Logits
    replace
    -0.81
    gat
    -0.79
     namely
    -0.76
    rand
    -0.72
    craft
    -0.71
    ftime
    -0.70
    thood
    -0.68
    watching
    -0.68
    / 
    -0.67
    .--
    -0.66
    POSITIVE LOGITS
     entire
    1.51
     entirety
    1.45
     remainder
    1.35
     slightest
    1.29
     same
    1.24
     whole
    1.19
     latter
    1.17
    ses
    1.16
     smallest
    1.15
     brunt
    1.13
    Act Density 1.546%

    No Known Activations