INDEX

Explanations

weird, strange, odd

The neuron flags descriptive adjectives that signal something odd or unusual.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

gewiesen

-1.02

 zeigt

-0.90

殍

-0.89

czeniu

-0.86

 salão

-0.85

 forse

-0.84

糁

-0.84

atrician

-0.83

 thèmes

-0.83

بسم

-0.83

POSITIVE LOGITS

but

1.38

 behavior

1.38

 shaped

1.36

 circumstance

1.32

 twist

1.27

 behaviour

1.21

ोग

1.20

 phrasing

1.20

 happenings

1.20

 circumstances

1.20

Activations Density 0.041%