INDEX
    Explanations

    discussions about hypocrisy and the coherence of beliefs and actions

    New Auto-Interp
    Negative Logits
    ÑĤÑĢа
    -0.14
     sport
    -0.14
    lenen
    -0.14
    SEG
    -0.14
    bah
    -0.14
     showc
    -0.14
    RunWith
    -0.14
    urgeon
    -0.13
     physic
    -0.13
     Anadolu
    -0.13
    POSITIVE LOGITS
     Alic
    0.16
     meta
    0.14
    andard
    0.14
    .Meta
    0.14
    riel
    0.14
    meta
    0.14
    *>(&
    0.14
    mmas
    0.14
    opia
    0.13
     célib
    0.13
    Act Density 0.140%

    No Known Activations