INDEX

Explanations

phrases indicating a sense of judgment or evaluation about situations or actions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

-0.07

alle

-0.06

ouse

-0.06

NCY

-0.06

hi

-0.06

ilet

-0.06

 premises

-0.06

alez

-0.06

THEN

-0.05

=",

-0.05

POSITIVE LOGITS

emachine

0.07

uddy

0.07

Pes

0.07

chas

0.07

andro

0.06

kate

0.06

 dreaded

0.06

ROC

0.06

isos

0.06

 pharm

0.06

Activations Density 0.036%