INDEX

Explanations

demeter & phishing

np_acts-logits-general · gemini-2.5-flash-lite

salient named entities and acronyms—proper names, titles, and specialized technical terms that signal key subjects across domains.

oai_token-act-pair · gpt-5 Triggered by @vetterc0

The neuron activates on enumerated list entries—especially section headings or item titles (e.g. anime character names) in a ranked “waifu” list.

oai_token-act-pair · o4-mini Triggered by @yooniel31

user query topics or keywords.

oai_token-act-pair · deepseek-r1 Triggered by @yooniel31

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_31_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

৭

0.16

ل

0.16

rvGroup

0.15

name

0.15

ناہ

0.15

ఫ్

0.14

 অন্যান্য

0.14

کے

0.14

ว

0.14

ുവരി

0.14

POSITIVE LOGITS

0.19

0.18

。

0.17

と呼ばれる

0.16

®,

0.16

®.

0.15

had

0.14

യുടെ

0.14

！

0.14

Activations Density 10.766%

demeter & phishing

salient named entities and acronyms—proper names, titles, and specialized technical terms that signal key subjects across domains.

The neuron activates on enumerated list entries—especially section headings or item titles (e.g. anime character names) in a ranked “waifu” list.

user query topics or keywords.

No Comments

No Known Activations