INDEX

Explanations

fucking

The main thing this neuron does is detect swear words or expletive intensifiers used for emphasis or strong emotion.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 Orrell

0.77

 prototypical

0.72

ise

0.68

蔼

0.68

很多

0.68

旎

0.67

很好的

0.66

 Kays

0.66

 Biswas

0.66

 formidable

0.66

POSITIVE LOGITS

Мо

1.10

М

1.09

Ма

1.03

Мі

1.02

А

1.01

Р

0.94

Х

0.92

Й

0.91

На

0.87

Т

0.85

Activations Density 0.008%