INDEX

Explanations

guardian, angel, protector

The neuron activates on instances of the word “Guardian” (especially in legal headings and titles).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

krieg

-0.93

🇿

-0.92

 incendio

-0.90

 manches

-0.89

decken

-0.89

 expire

-0.85

 Lazio

-0.85

 laquelle

-0.84

ļa

-0.84

endorong

-0.84

POSITIVE LOGITS

 angel

1.62

 angels

1.52

 Angel

1.45

 Angels

1.26

天使

1.22

angels

1.15

Angel

1.09

 guardian

1.05

 Guardian

1.00

angel

0.99

Activations Density 0.011%