INDEX

Explanations

familiarity and expectedness

The neuron strongly activates on mentions of “familiar” (and related forms like “familiarity”), effectively spotting references to familiarity.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

蹰

-3.39

萇

-2.86

瘊

-2.64

 marvelous

-2.59

鍺

-2.58

 různých

-2.55

 těch

-2.52



-2.52

 tumultuous

-2.48

歘

-2.48

POSITIVE LOGITS

 Surprisingly

2.95

Surprisingly

2.69

Even

2.69

 Basically

2.67

 Following

2.66

 Almost

2.66

 That

2.61

 Taking

2.59

 Before

2.58

 Making

2.56

Activations Density 0.006%