INDEX

Explanations

human suffering

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

DockStyle

-0.82

LookAnd

-0.82



-0.79

CloseOperation

-0.78

+#+#

-0.75

__*/

-0.71

 Мексичка

-0.71

 deforestation

-0.70

IntoConstraints

-0.69

menistan

-0.64

POSITIVE LOGITS

ist

0.68

ary

0.67

in

0.61

ists

0.57

of

0.55

 from

0.52

al

0.52

 among

0.52

ally

0.51

 caused

0.50

Activations Density 0.083%