INDEX
Explanations
Perhaps, alas, marvellous
whether a token is part of the model/assistant's generated reply (i.e., signals assistant/model-produced text).
New Auto-Interp
Negative Logits
guys
0.94
weird
0.93
guy
0.89
weird
0.84
awesome
0.78
kids
0.78
enggak
0.78
dude
0.76
guys
0.76
udah
0.76
POSITIVE LOGITS
весьма
1.01
столь
0.91
便是
0.87
alas
0.86
Perhaps
0.85
Perhaps
0.85
marvellous
0.84
之事
0.84
perhaps
0.84
अवश्य
0.84
Activations Density 0.273%