INDEX
Explanations
instructions and queries
text that contains explicit instructions, rules, or constraints directing the assistant's behavior (system prompts and policy-style directives).
language that conveys formal task specifications—constraints, procedural instructions, policies, links/resources, templates/formats, and feature requirements.
New Auto-Interp
Negative Logits
Incidentally
0.41
defn
0.40
doubtless
0.39
paltry
0.38
<unused303>
0.38
ostensibly
0.38
stalwart
0.37
<unused2049>
0.36
<unused267>
0.36
THRESH
0.36
POSITIVE LOGITS
bellow
0.61
´
0.60
advices
0.56
planification
0.56
lenght
0.56
ressources
0.55
Nowadays
0.55
wich
0.53
partecip
0.52
restauration
0.52
Activations Density 0.073%