INDEX

Explanations

unwanted items or outcomes

The neuron consistently fires on occurrences of the adjective “unwanted.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

拚

-2.50

-2.44

乄

-2.42

despite

-2.34

瘣

-2.31

⧠

-2.17

骉

-2.11

慫

-2.08

ing

-2.05

 Kraków

-2.03

POSITIVE LOGITS

 These

2.64

 andere

2.41

er

2.39

These

2.31

is

2.23

 exquisite

2.16

 kaas

2.08

最重要

2.05

突破

2.02

واصل

1.98

Activations Density 0.001%