INDEX

Explanations

negative emotions and events

The neuron flags strong negative‐emotion words (e.g. dismaying, disheartening, appalling, horrified, distraught).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

-2.52

-2.02

は

-1.93

деи

-1.90

-1.88

-1.87

 Когда

-1.80

-1.79

-1.77

If

-1.76

POSITIVE LOGITS

膘

2.20

齶

2.13

死的

2.09

 всички

2.02

῞

2.02

仉

1.95

 milyen

1.94

发生的

1.92



1.91

↵↵

1.90

Activations Density 0.007%