INDEX

Explanations

risk of negative outcomes

The main thing this neuron does is find language signaling risk, danger, or threat (e.g., “in danger of,” “risk of,” “threatens to”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

-1.99

ностях

-1.68

 viera

-1.54

あとは

-1.54

なんとか

-1.46

 want

-1.45

after

-1.44

最後は

-1.43

しましたが

-1.40

 umane

-1.40

POSITIVE LOGITS

嫱

1.73

鈇

1.59

 naudoti

1.49

så

1.43

ЛО

1.42

高

1.42

 keinginan

1.41

完全

1.40

aikan

1.38

ători

1.37

Activations Density 0.022%