INDEX
    Explanations

    references to deception or falsehoods

    New Auto-Interp
    Negative Logits
     Woodstock
    -0.58
    Normally
    -0.56
    onato
    -0.56
     PMA
    -0.56
     Carnaval
    -0.55
    GPP
    -0.55
     Ammon
    -0.53
     Carthag
    -0.53
    Davidson
    -0.53
     ECR
    -0.53
    POSITIVE LOGITS
     lies
    1.91
     Lies
    1.78
    Lies
    1.68
     lie
    1.41
    lies
    1.41
     mentiras
    1.27
     mentira
    1.09
     Lie
    1.04
    lie
    1.01
     lying
    0.97
    Act Density 0.007%

    No Known Activations