INDEX

Explanations

conjunctions and punctuation

This neuron detects strongly opinionated or evaluative words and phrases—especially negative adjectives and adverbs that convey a subjective judgment.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 stratég

-1.86

瑒

-1.84

äldrar

-1.80

 mauva

-1.77

獃

-1.72

ître

-1.70

 acheter

-1.69

嫑

-1.69

瓛

-1.68

翃

-1.66

POSITIVE LOGITS

or

2.34

 because

2.06

2.03

2.02

 putting

2.00

one

1.98

if

1.91

 when

1.90

 first

1.88

Activations Density 0.095%