INDEX
    Explanations

    themes of deception and manipulation in narratives

    New Auto-Interp
    Negative Logits
    odia
    -0.17
    engu
    -0.17
    asser
    -0.16
    itez
    -0.15
    TextLabel
    -0.15
    à¹Ģà¸ķà¸Ńร
    -0.14
     Taken
    -0.14
    llx
    -0.14
    uba
    -0.13
    ìĿį
    -0.13
    POSITIVE LOGITS
    agem
    0.18
    arov
    0.17
    ibur
    0.16
     behind
    0.16
     preco
    0.15
    ár
    0.15
    ourt
    0.15
    igo
    0.15
     plan
    0.14
     æİ§
    0.14
    Act Density 0.328%

    No Known Activations