INDEX

Explanations

no offense

The neuron activates on negative replies or negation tokens (e.g. “no,” “not”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

and

-1.22

OTTEN

-0.88

덱

-0.88

jarati

-0.86

cování

-0.85

rinhos

-0.85

AfterEach

-0.85

ceptos

-0.84

収録曲

-0.82

cobacterium

-0.82

POSITIVE LOGITS

way

1.53

 chance

1.25

no

1.23

 offense

1.22

1.20

 नहीं

1.07

 offence

1.07

it

1.04

 không

1.00

Way

0.98

Activations Density 0.008%