INDEX
    Explanations

    phrases associated with political positions and actions

    New Auto-Interp
    Negative Logits
    aeda
    -0.16
    ustralian
    -0.15
    abic
    -0.15
    erdem
    -0.15
    amo
    -0.15
    olen
    -0.14
    ileged
    -0.14
     åIJ
    -0.14
    fak
    -0.14
    odied
    -0.14
    POSITIVE LOGITS
     intent
    0.22
     hell
    0.20
     determined
    0.19
    intent
    0.19
     caught
    0.19
     tone
    0.18
     bent
    0.18
     allergic
    0.18
    toast
    0.18
     boxed
    0.17
    Act Density 0.169%

    No Known Activations