INDEX
    Explanations

    references to political figures and entities

    words related to specific individuals or entities

    New Auto-Interp
    Negative Logits
    channelAvailability
    -0.64
    SHIP
    -0.62
    bourg
    -0.60
     confines
    -0.60
     CPC
    -0.57
     STATS
    -0.55
    resses
    -0.55
     spat
    -0.54
     goats
    -0.54
    azes
    -0.54
    POSITIVE LOGITS
    Wan
    0.91
    leck
    0.85
    ratulations
    0.81
    worldly
    0.80
    wald
    0.76
    uary
    0.71
    aida
    0.70
    yssey
    0.69
    llor
    0.67
    untu
    0.66
    Act Density 0.090%

    No Known Activations