INDEX
    Explanations

    instances of the word "lie" and its variations, indicating themes of deception or falsehood

    New Auto-Interp
    Negative Logits
     Jacobsen
    -0.80
     McCl
    -0.78
    AndEndTag
    -0.76
    ėl
    -0.74
    рс
    -0.73
    بالإنجليزية
    -0.73
    σσ
    -0.72
    agoza
    -0.71
    urator
    -0.71
     Medford
    -0.70
    POSITIVE LOGITS
    Lie
    1.11
     lie
    1.10
     Lie
    1.02
     LIE
    0.98
    Lies
    0.96
     Lies
    0.94
     lying
    0.93
    lying
    0.89
     Lying
    0.89
     lies
    0.89
    Act Density 0.088%

    No Known Activations