INDEX

Explanations

common phrases

The neuron fires on (mostly high‐value) verbs and modifiers that describe changing or adjusting something—words like “give,” “take,” “add,” “clear,” etc.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

Dinosaur

-0.74

 therefrom

-0.72

demie

-0.72

 acess

-0.71

入手

-0.70

 прид

-0.70

↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵

-0.69

 anew

-0.69

constra

-0.69

 Tait

-0.69

POSITIVE LOGITS

 things

0.85

 stuff

0.84

 التاريخ

0.80

org

0.79

syn

0.76

 البلد

0.75

东西

0.75

東西

0.75

 einiges

0.73

 troublesome

0.73

Activations Density 0.051%