INDEX

Explanations

don't do or advise against

The neuron flags advisory or warning language—phrases that tell you “don’t,” “shouldn’t,” “avoid,” “advise,” “recommend,” etc.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 morreu

-1.16

 všechny

-1.11

บ้าง

-1.09

 instead

-1.09

CAPT

-1.07

quée

-1.04

 encontrou

-1.02

낍

-1.01

quiao

-1.01

CAPÍTULO

-1.01

POSITIVE LOGITS

any

1.43

 unless

1.30

 terlalu

1.29

 EVER

1.21

 anymore

1.20

use

1.14

it

1.13

 anyone

1.09

for

1.05

有任何

1.05

Activations Density 0.034%