INDEX

Explanations

opposite pairs and counterparts

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

でない

-0.86

chés

-0.83

 gesche

-0.80

Herzliche

-0.79

newblock

-0.79

Implement

-0.79

Environment

-0.78

enig

-0.78

(>

-0.78

}{*}{

-0.77

POSITIVE LOGITS

 opposite

1.25

 reversed

1.20

 Opposite

1.06

 reverse

1.05

Opposite

0.94

 flipped

0.94

 counterpart

0.93

flipped

0.91

 reverses

0.91

 reversal

0.91

Activations Density 0.058%