INDEX
Explanations
words or tokens related to programming, technical terms, or conversational roles within code or instruction-like contexts.
instructions about how to analyze, process, or structure responses to user queries.
chat-style conversation scaffolding, especially role markers, prompt/instruction meta text, and assistant reply boilerplate within multi-turn dialogues
references to specific test strings or identifiers (particularly "davidjl") being analyzed or manipulated in conversational exchanges.
New Auto-Interp
Negative Logits
iap
-0.08
atient
-0.07
😳
-0.07
eng
-0.07
事を
-0.07
正是因为
-0.07
Concat
-0.07
jt
-0.07
ماذا
-0.07
.LayoutControlItem
-0.07
POSITIVE LOGITS
giải
0.08
_of
0.08
_expr
0.08
продук
0.07
submission
0.07
_SPLIT
0.07
老婆
0.07
.div
0.07
ся
0.07
пара
0.07
Activations Density 42.091%