INDEX

Explanations

body parts and hospital

np_acts-logits-general · gemini-2.5-flash-lite

words and phrases related to physical location, position, or bodily sensations.

oai_token-act-pair · claude-3-7-sonnet-20250219 Triggered by @neilrathi

This neuron picks up on words that signal a warning or alarm—i.e. terms used in issuing cautions, trigger‐warnings, or otherwise alarming statements.

oai_token-act-pair · o4-mini Triggered by @jyhe0408

New Auto-Interp

Configuration

google/gemma-scope-27b-pt-res/layer_22/width_131k

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ături

-0.97

acht

-0.93

 guerrilla

-0.89

 éviter

-0.87

 chiede

-0.82

ziek

-0.82

tolsó

-0.82

 ウエスト

-0.82

čný

-0.82

 значит

-0.81

POSITIVE LOGITS

atória

0.86

因为

0.84

więks

0.83

toprule

0.82

Lycka

0.81

いだ

0.81

chok

0.80

his

0.79

 burst

0.78

形容

0.78

Activations Density 0.052%

body parts and hospital

words and phrases related to physical location, position, or bodily sensations.

This neuron picks up on words that signal a warning or alarm—i.e. terms used in issuing cautions, trigger‐warnings, or otherwise alarming statements.

No Comments

No Known Activations

body parts and hospital

words and phrases related to physical location, position, or bodily sensations.

This neuron picks up on words that signal a warning or alarm—i.e. terms used in issuing cautions, trigger‐warnings, or otherwise alarming statements.

No Comments

No Known Activations