INDEX

Explanations

paradox and irony

The neuron fires on single, attention-grabbing topic or genre labels (e.g. “Paradox,” “Clickbait,” “Sarcasm,” “Loophole”) that typically appear as boldfaced or headline words.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

៥

-0.93

GARET

-0.89

roben

-0.87

 influential

-0.86

劫

-0.85

beforeAll

-0.84

oat

-0.84

Profiles

-0.83

 metaphor

-0.82

gives

-0.82

POSITIVE LOGITS

 about

1.30

 regarding

1.12

 galore

1.02

 ridden

1.00

ridden

0.99

 involving

0.95

laden

0.92

 perpetrated

0.91

 detector

0.90

 acerca

0.89

Activations Density 0.069%