INDEX
    Explanations

    phrases related to political ideologies and extremist groups

    New Auto-Interp
    Negative Logits
     And
    -0.60
    less
    -0.58
     Of
    -0.57
    <bos>
    -0.57
     No
    -0.57
     But
    -0.57
     At
    -0.56
     As
    -0.56
    BeforeAll
    -0.56
     My
    -0.56
    POSITIVE LOGITS
     effe
    1.62
     wien
    1.59
     increa
    1.53
     deleter
    1.51
     suspic
    1.50
     aen
    1.49
     sovere
    1.49
     nece
    1.49
     fatis
    1.49
     pessi
    1.48
    Act Density 0.068%

    No Known Activations