INDEX

Explanations

apologize, redemption, remorse

The neuron detects words and word‐parts related to apologies (e.g. “apologize,” “apologies,” “unapologetic,” etc.).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

are

-1.63

-1.15

of

-1.13

-1.12

-1.07

vegli

-1.04

preprint

-1.02

-1.01

と思ったら

-1.01

-1.00

POSITIVE LOGITS

paket

1.40

 maravillas

1.28

 remorse

1.27

offering

1.23

össä

1.23

Szia

1.23

kamera

1.23

 acciones

1.23

 criaturas

1.22

موقع

1.22

Activations Density 0.022%