INDEX
    Explanations

    Truth and deception

    New Auto-Interp
    Negative Logits
     ])->
    -0.07
    -transition
    -0.07
    _aux
    -0.06
    Indent
    -0.06
    .Import
    -0.06
     Property
    -0.06
     DEALINGS
    -0.06
     insights
    -0.06
    -0.06
    ‌پدیای
    -0.06
    POSITIVE LOGITS
     truthful
    0.07
     brand
    0.06
     Canal
    0.06
     surveys
    0.06
    .");↵
    0.06
     uf
    0.06
     Bucket
    0.06
     sockets
    0.06
     lokal
    0.06
    	boolean
    0.06
    Act Density 0.067%

    No Known Activations