INDEX
    Explanations

    phrases related to political entities or ideologies

    references to late-night shows and political left/right distinctions

    New Auto-Interp
    Negative Logits
    oused
    -0.72
    iosyncr
    -0.72
    è¦ļéĨĴ
    -0.70
    ounter
    -0.68
     externalToEVAOnly
    -0.67
    ibly
    -0.66
    äºĶ
    -0.66
    icable
    -0.66
    ILY
    -0.65
    BILITY
    -0.65
    POSITIVE LOGITS
     Thing
    0.97
     Ones
    0.95
     Definition
    0.92
     Order
    0.88
     Day
    0.88
     Responsibility
    0.88
     Works
    0.87
     Roads
    0.87
     Lives
    0.85
     Guys
    0.85
    Act Density 0.150%

    No Known Activations