INDEX

Explanations

respect

The neuron strongly activates on occurrences of the word “respect” (and its direct variants like “disrespect”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 aktuell

-2.05

郸

-2.03

-2.00

liziert

-1.86

してた

-1.84

 всем

-1.81

зм

-1.80

mpagne

-1.79

-1.77

 accuse

-1.77

POSITIVE LOGITS

respect

2.00

鏵

2.00

 simpel

1.96

 maske

1.95

even

1.87

 deri

1.86

镲

1.80

໊

1.76

 boks

1.73

jue

1.71

Activations Density 0.013%