INDEX
    Explanations

    concepts related to contradictions and moral dilemmas in discourse

    New Auto-Interp
    Negative Logits
    abbo
    -0.17
    llib
    -0.16
     à¹Ĩ
    -0.15
    ruh
    -0.14
    .removeAll
    -0.14
     inexp
    -0.14
    uddenly
    -0.14
    رÙĥ
    -0.14
    ÑģиÑĤ
    -0.14
    751
    -0.14
    POSITIVE LOGITS
    èĭ¥
    0.17
    å¦Ĥ
    0.17
    akin
    0.15
    виж
    0.15
    è¿ĺæľī
    0.15
     unless
    0.14
    orney
    0.14
    è¦
    0.14
     True
    0.14
     until
    0.14
    Act Density 0.008%

    No Known Activations