INDEX
Explanations
Dialogue/conversational
Tokens that are part of the assistant's generated output (i.e., the assistant role / response text).
New Auto-Interp
Negative Logits
19
-0.08
beats
-0.07
someone
-0.06
60
-0.06
uro
-0.06
membership
-0.06
30
-0.06
28
-0.06
Occ
-0.06
20
-0.06
POSITIVE LOGITS
��
0.08
hôn
0.07
ิร
0.07
Не
0.07
.setContentType
0.06
setFont
0.06
ruku
0.06
etwa
0.06
ább
0.06
luego
0.06
Activations Density 0.069%