INDEX

Explanations

expressions of negation or denial

The neuron picks out words and short phrases that signal strong subjective emphasis or hedging of an opinion (e.g. “obsessed,” “at all,” “at least,” “claims,” “know,” “see”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

itiés

-0.88

'&:

-0.85

douard

-0.84

 unfailing

-0.83

avant

-0.82

 liksom

-0.79

BURGH

-0.78

 Algunas

-0.77

uza

-0.77

 <<<<<<<<<<<<<<

-0.76

POSITIVE LOGITS

any

1.05

 nothing

0.82

 dijeron

0.82

interest

0.80

Interest

0.78

 никаких

0.78

有任何

0.78

écart

0.77

 сър

0.77



0.75

Activations Density 0.036%