INDEX
    Explanations

    instances of hypocrisy within political and social discourse

    New Auto-Interp
    Negative Logits
     inst
    -0.17
    orta
    -0.16
     scaleX
    -0.15
    /../
    -0.15
    pun
    -0.14
    elman
    -0.14
    μή
    -0.13
    uder
    -0.13
    orte
    -0.13
     flo
    -0.13
    POSITIVE LOGITS
    oux
    0.16
    illum
    0.15
    ault
    0.15
     Neck
    0.15
    è¶Ĭ
    0.14
    IJ
    0.14
    iggs
    0.14
     Candle
    0.14
    iks
    0.14
    edList
    0.14
    Act Density 0.174%

    No Known Activations