INDEX

Explanations

speaking out freely

suppression of dissent and judgment

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

Batt

0.43

opedic

0.37

π

0.37

batt

0.37

椥

0.36

IsPass

0.36

 სან

0.36

 सद्

0.35

 Manipulation

0.35

自信

0.35

POSITIVE LOGITS

 inconvenient

1.13

 dissent

1.00

 dissenting

0.92

 perceived

0.90

 pesky

0.85

 unwelcome

0.84

忤

0.84

 inconvenience

0.81

 slightest

0.80

 undesired

0.79

Activations Density 0.408%