INDEX
Explanations
model responses
the assistant’s turn marker in chat-style transcripts (i.e., the model speaker tag)
New Auto-Interp
Negative Logits
payload
0.36
STM
0.36
rq
0.35
Application
0.35
applying
0.35
generated
0.35
benchmark
0.34
तक्र
0.33
heb
0.33
aplicar
0.33
POSITIVE LOGITS
โร
0.33
来说
0.30
்கலை
0.29
사람들이
0.29
свадь
0.29
সীমান্ত
0.29
今天
0.28
разговари
0.28
ServletException
0.28
ഴിലാ
0.28
Activations Density 0.124%