INDEX
    Explanations

    references to lies and deception

    instances of the word "lie" and its variations in different contexts

    New Auto-Interp
    Negative Logits
    arta
    -0.78
    orr
    -0.62
    illion
    -0.61
    lished
    -0.61
     weaving
    -0.60
    Attempts
    -0.60
    200000
    -0.59
    =-=-=-=-=-=-=-=-
    -0.58
     liking
    -0.57
    andal
    -0.57
    POSITIVE LOGITS
    lies
    1.22
    utenant
    0.92
    lie
    0.80
    poons
    0.75
    creen
    0.74
     showc
    0.71
    layer
    0.69
    chool
    0.69
    HF
    0.67
    ogyn
    0.67
    Act Density 0.003%

    No Known Activations