INDEX
    Explanations

    concepts related to truth and deception in discourse

    New Auto-Interp
    Negative Logits
    zcze
    -0.15
    ville
    -0.15
    oni
    -0.15
    idis
    -0.15
    inc
    -0.15
    abr
    -0.15
    od
    -0.14
    ama
    -0.14
     اخت
    -0.14
    zl
    -0.14
    POSITIVE LOGITS
     reminded
    0.18
    ieder
    0.17
    uetype
    0.17
    ekk
    0.16
    ngth
    0.16
    oldem
    0.16
    issan
    0.15
     Exactly
    0.15
    anmar
    0.15
     remind
    0.15
    Act Density 0.006%

    No Known Activations