INDEX

Explanations

how things behave

This neuron fires on words that refer to people or entities—especially personal pronouns (I, me, they) and nouns denoting human or other agents.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 ruchu

-0.80

 recibe

-0.77

ネート

-0.76

 előtt

-0.75

ceptible

-0.74

мови

-0.73

msen

-0.72

 ਨੂੰ

-0.71

Է

-0.70

繚

-0.70

POSITIVE LOGITS

 behave

4.88

 behaving

4.41

 behaves

4.41

 acting

4.16

 behaved

4.09

act

4.00

 acted

3.80

acting

3.38

 acts

3.25

Acting

3.11

Activations Density 0.089%