INDEX
    Explanations

    terms related to political ideologies and their critiques

    New Auto-Interp
    Negative Logits
    e
    -0.80
    wdata
    -0.69
     abiti
    -0.66
    laşı
    -0.62
    caption
    -0.60
    Kla
    -0.59
     Fletcher
    -0.59
    ț
    -0.58
     lotes
    -0.58
     Kla
    -0.58
    POSITIVE LOGITS
     itſelf
    0.95
     myſelf
    0.94
     Efq
    0.88
     '\\;'
    0.85
     becauſe
    0.84
     BoxDecoration
    0.83
     leaſt
    0.83
    cist
    0.83
     depositphotos
    0.83
    izm
    0.82
    Act Density 0.143%

    No Known Activations