INDEX

Explanations

danger

The neuron detects occurrences of the root “danger,” activating strongly on words like “danger,” “dangerous,” and “dangerously.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

.’”

-2.53

癭

-2.25

-}

-2.23

 ?”

-2.17

廕

-2.16

蛧

-2.14

 スクール

-2.14

梘

-2.09

 manten

-2.05

 οπο

-2.03

POSITIVE LOGITS

2.50

2.44

$,

2.42

This

2.28

 Approximately

2.22

 Presumably

2.20

ра

2.20

 Roughly

2.16

2.02

Activations Density 0.000%