INDEX

Explanations

opinions expressed are mine

This neuron detects disclaimer language—phrases that qualify or distance the author’s opinions or content (e.g. “the thoughts and opinions expressed are those of the writer and not necessarily those of…”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

vori

-0.93

 mention

-0.88

-0.85

for

-0.84

UINTN

-0.82

 mentions

-0.81

to

-0.79

 part

-0.77

 portions

-0.77

 names

-0.76

POSITIVE LOGITS

expressed

1.21

sole

1.13

 henkil

1.12

 solely

1.10

 those

1.07

 исключительно

1.02

alone

1.00

 expressed

0.98

 riêng

0.98

 eivät

0.96

Activations Density 0.028%