INDEX
    Explanations

    instances of dishonesty or discrepancies in statements

    New Auto-Interp
    Negative Logits
    -0.63
    ↵↵
    -0.63
    ,
    -0.57
     parcourir
    -0.55
    SourceChecksum
    -0.52
     and
    -0.48
    N
    -0.48
    -0.47
    .
    -0.47
    רושלים
    -0.46
    POSITIVE LOGITS
     Italijanski
    0.75
     ddelweddau
    0.71
    BagConstraints
    0.69
     Vikipedi
    0.68
    RTGC
    0.67
     Inti
    0.67
    ]--;
    0.66
     IPX
    0.64
     Enumer
    0.64
    theros
    0.63
    Act Density 0.019%

    No Known Activations