INDEX

Explanations

phrases indicating the concept of control or authority

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

rame

-0.07

uario

-0.07

_sink

-0.07

 DropIndex

-0.07

Ð»Ð¸Ð²

-0.07

deen

-0.06

Ð»Ð¸Ð²Ð°

-0.06

 MemoryStream

-0.06

MITTED

-0.06

POSITIVE LOGITS

 control

0.11

/control

0.10

-control

0.09

control

0.09

 Control

0.09

Control

0.08

 CONTROL

0.08

(control

0.08

/power

0.08

CONTROL

0.07

Activations Density 0.009%