INDEX
    Explanations

    phrases related to public opinion or voting outcomes

    references to the concept of popularity in various contexts

    New Auto-Interp
    Negative Logits
    thur
    -0.75
    ourke
    -0.70
     Aviv
    -0.70
    abetic
    -0.68
     Kear
    -0.67
    agher
    -0.67
    xual
    -0.66
    ritch
    -0.64
     Territ
    -0.63
    ASC
    -0.63
    POSITIVE LOGITS
    ized
    0.96
    izing
    0.92
    isations
    0.91
    izations
    0.89
    ity
    0.87
    ised
    0.86
    ization
    0.76
    izer
    0.76
    ize
    0.74
    izers
    0.73
    Act Density 0.014%

    No Known Activations