INDEX

Explanations

people water

The neuron strongly activates on first-person, self-referential language—especially “I” and related pronouns/constructions indicating the speaker talking about themselves.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 kuliah

-0.87

 load

-0.86

iseen

-0.85

료

-0.84

 food

-0.83

Techniques

-0.83

wife

-0.82

 vektör

-0.81

 rock

-0.81

 خونه

-0.81

POSITIVE LOGITS

Whilst

0.99

 slay

0.97

ATR

0.96

 Britney

0.95

キム

0.93

twimg

0.93

 gays

0.90

 heifers

0.89

 её

0.89

 Gaga

0.89

Activations Density 0.029%