INDEX

Explanations

references to an authoritative or influential figure, often with a negative connotation of manipulation or deceit

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 useStyles

-0.70

AnimationsModule

-0.66

 noDo

-0.62

 שוליים

-0.62

testify

-0.59

CppCodeGen

-0.54

IntoConstraints

-0.52

DropTable

-0.52

readyState

-0.51

chowa

-0.50

POSITIVE LOGITS

hir

3.88

HIR

1.90

Hir

1.77

hir

1.72

Hir

1.64

hiri

1.20

hira

1.14

HIR

0.95

heer

0.89

hirt

0.83

Activations Density 0.001%