INDEX

Explanations

corruption and disruption

The neuron triggers on the subword “rupt,” i.e. in words like “corrupt,” “disrupt,” “endocrine‐disrupting,” etc.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

厹

-2.55

伥

-2.44

actéristi

-2.33

ierenden

-2.28

 echte

-2.22

 quintessential

-2.19

噲

-2.17

豨

-2.17

 colorida

-2.11

神色

-2.11

POSITIVE LOGITS

3.36

2.78

其妙

2.75

while

2.44

with

2.44

🤠

2.39

剧照

2.36

 seems

2.22

的很

2.17

Activations Density 0.006%