INDEX
    Explanations

    words related to misrepresentation or misrepresentation itself

    terms related to misrepresentation and its nuances

    New Auto-Interp
    Negative Logits
    ï¸
    -0.69
    STON
    -0.69
    mith
    -0.69
    \\\\\\\\\\\\\\\\
    -0.68
    pei
    -0.64
    cius
    -0.63
    WAYS
    -0.63
     nerv
    -0.63
    vantage
    -0.62
    creen
    -0.61
    POSITIVE LOGITS
    ation
    1.62
    ations
    1.51
    ed
    1.21
    ing
    1.03
    ated
    1.01
    ating
    0.99
    eering
    0.98
    atives
    0.98
    ational
    0.97
    edly
    0.95
    Act Density 0.036%

    No Known Activations