INDEX
    Explanations

    Perhaps, alas, marvellous

    whether a token is part of the model/assistant's generated reply (i.e., signals assistant/model-produced text).

    New Auto-Interp
    Negative Logits
     guys
    0.94
     weird
    0.93
     guy
    0.89
    weird
    0.84
     awesome
    0.78
     kids
    0.78
     enggak
    0.78
     dude
    0.76
    guys
    0.76
     udah
    0.76
    POSITIVE LOGITS
     весьма
    1.01
     столь
    0.91
    便是
    0.87
     alas
    0.86
    Perhaps
    0.85
     Perhaps
    0.85
     marvellous
    0.84
    之事
    0.84
     perhaps
    0.84
     अवश्य
    0.84
    Act Density 0.273%

    No Known Activations