INDEX
    Explanations

    instances of the word "wrong" and its variations

    New Auto-Interp
    Negative Logits
    TIS
    -0.56
    +*
    -0.53
    calfe
    -0.53
    hibli
    -0.52
     Periods
    -0.50
    ofan
    -0.47
     CCM
    -0.47
    Deli
    -0.47
     Tahiti
    -0.47
    uride
    -0.47
    POSITIVE LOGITS
     wrong
    1.23
    wrong
    1.20
    Wrong
    1.18
     Wrong
    1.13
     WRONG
    1.05
    WRONG
    1.05
     wrongs
    0.85
     sbag
    0.78
     wrongful
    0.71
     incorrect
    0.70
    Act Density 0.009%

    No Known Activations