INDEX

Explanations

good followed by range, taste, code, weather, practices

The neuron fires strongly on positive evaluative adjectives conveying praise or approval (e.g. “good,” “great,” “better,” “fantastic,” “ideal”).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

১০

1.39

1.16

 DiCaprio

1.12

Şu

1.04

่

1.01

lu

1.00

loten

0.99

də

0.99

 diseases

0.98

০

0.96

POSITIVE LOGITS

상을

1.14

 खासा

1.09

ست

1.06

 качества

0.98

ắn

0.96

삭

0.96

ية

0.90

 suited

0.90

 fortune

0.90

นี่

0.89

Activations Density 0.844%