INDEX
    Explanations

    concepts related to deception and credibility

    New Auto-Interp
    Negative Logits
    ób
    -0.14
    بعد
    -0.13
     especially
    -0.13
    onec
    -0.13
    λÏī
    -0.13
    ालन
    -0.13
    _BEFORE
    -0.12
     пÑĢежде
    -0.12
    _named
    -0.12
    ovÃŃ
    -0.12
    POSITIVE LOGITS
     Conversely
    0.58
     convers
    0.53
     whereas
    0.44
     Whereas
    0.43
     meanwhile
    0.43
     Meanwhile
    0.41
     naopak
    0.40
    Meanwhile
    0.39
     Likewise
    0.38
     likewise
    0.38
    Act Density 0.213%

    No Known Activations