INDEX
    Explanations

    phrases related to political topics

    New Auto-Interp
    Negative Logits
    >>>>>>>>
    -0.62
    LAN
    -0.61
    Comb
    -0.59
     Tues
    -0.59
    aco
    -0.58
     Dating
    -0.58
     roundup
    -0.57
    ibal
    -0.57
     sorts
    -0.57
     Fresh
    -0.56
    POSITIVE LOGITS
    iris
    0.97
     perceive
    0.86
     oppose
    0.81
    rir
    0.80
     partake
    0.79
     whom
    0.78
     offend
    0.78
     consume
    0.76
     "$:/
    0.74
     harmed
    0.74
    Act Density 0.272%

    No Known Activations