INDEX

Explanations

instances of emotional or evaluative language related to events or actions

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ationale

-0.08

 nackte

-0.07

itoris

-0.07

ylko

-0.07

asal

-0.07

onga

-0.07

ÐµÐºÐ°ÑĢ

-0.07

edback

-0.07

 tiener

-0.07

hai

-0.07

POSITIVE LOGITS

 finally

0.21

 Lastly

0.21

Lastly

0.20

 Finally

0.19

finally

0.17

Finally

0.16

 overall

0.15

 altogether

0.14

 Overall

0.13

all

0.13

Activations Density 0.199%