INDEX
    Explanations

    concepts related to hypocrisy and inconsistent beliefs

    New Auto-Interp
    Negative Logits
    ÙĦات
    -0.15
     Bien
    -0.15
     blas
    -0.15
    éric
    -0.14
    iek
    -0.14
    uid
    -0.14
    iu
    -0.14
     cyn
    -0.14
    iment
    -0.13
    ester
    -0.13
    POSITIVE LOGITS
     straw
    0.19
     defenses
    0.18
     Straw
    0.17
     defenders
    0.17
    skyt
    0.16
     attacks
    0.16
    oje
    0.16
    wap
    0.15
     defend
    0.15
     pole
    0.15
    Act Density 0.840%

    No Known Activations