INDEX

Explanations

tolerate or not tolerate

The neuron responds to negative or prohibitive constructions—words and phrases expressing negation, prohibition, or condemnation (e.g. “no,” “not,” “shouldn’t,” “don’t,” “hate,” “condemning”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

Tó

-0.78

性に

-0.73

 vermeiden

-0.73

 remarqu

-0.73

 compareTo

-0.72

дневно

-0.72

桎

-0.72

tyd

-0.72

kø

-0.72

 axial

-0.71

POSITIVE LOGITS

con

2.75

 approve

2.13

 sanction

2.11

 endorse

2.03

 cond

1.91

 approval

1.85

 approving

1.77

 encourage

1.76

 tolerate

1.75

 toler

1.74

Activations Density 0.060%