INDEX
    Explanations

    secretly wronged or useless

    New Auto-Interp
    Negative Logits
    0.53
    0.49
    0.48
    0.47
     фараз
    0.46
     esimerk
    0.46
     structuring
    0.45
    0.45
    0.45
    0.44
    POSITIVE LOGITS
     blackmail
    0.49
     secretly
    0.48
    ji
    0.48
     wronged
    0.46
     useless
    0.44
     stupid
    0.44
     hehe
    0.43
    hehe
    0.43
     pretended
    0.43
     traitor
    0.43
    Act Density 0.005%

    No Known Activations