INDEX

Explanations

instructions and queries

np_acts-logits-general · gemini-2.5-flash-lite

text that contains explicit instructions, rules, or constraints directing the assistant's behavior (system prompts and policy-style directives).

oai_token-act-pair · gpt-5-mini Triggered by @vetterc0

language that conveys formal task specifications—constraints, procedural instructions, policies, links/resources, templates/formats, and feature requirements.

oai_token-act-pair · gpt-5 Triggered by @vetterc0

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_40_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

Incidentally

0.41

defn

0.40

 doubtless

0.39

 paltry

0.38

<unused303>

0.38

 ostensibly

0.38

 stalwart

0.37

<unused2049>

0.36

<unused267>

0.36

THRESH

0.36

POSITIVE LOGITS

 bellow

0.61

´

0.60

 advices

0.56

 planification

0.56

 lenght

0.56

 ressources

0.55

 Nowadays

0.55

 wich

0.53

 partecip

0.52

 restauration

0.52

Activations Density 0.073%

instructions and queries

text that contains explicit instructions, rules, or constraints directing the assistant's behavior (system prompts and policy-style directives).

language that conveys formal task specifications—constraints, procedural instructions, policies, links/resources, templates/formats, and feature requirements.

No Comments

No Known Activations

instructions and queries

text that contains explicit instructions, rules, or constraints directing the assistant's behavior (system prompts and policy-style directives).

language that conveys formal task specifications—constraints, procedural instructions, policies, links/resources, templates/formats, and feature requirements.

No Comments

No Known Activations