INDEX
    Explanations

    phrases related to political figures or political actions

    words related to a specific cultural or regional identity

    New Auto-Interp
    Negative Logits
    ques
    -0.62
    ICES
    -0.61
    OME
    -0.61
     textbooks
    -0.60
     Seymour
    -0.58
    gyn
    -0.58
    ãĥ¡
    -0.58
    Mods
    -0.57
     attendance
    -0.57
     Brexit
    -0.55
    POSITIVE LOGITS
    lasses
    1.29
    nir
    1.12
    regate
    1.03
    aroo
    1.01
    sa
    0.99
    sung
    0.99
    oing
    0.98
    fu
    0.98
    sv
    0.96
    sten
    0.90
    Act Density 0.049%

    No Known Activations