INDEX

Explanations

terms associated with personal history and social relationships

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 oneself

-0.07

 ourselves

-0.07

 yourself

-0.07

 yourselves

-0.07

ë³¸

-0.06

 Ð´Ð¾ÑĢ

-0.06

conte

-0.06

whose

-0.06

 myself

-0.06

 susceptible

-0.05

POSITIVE LOGITS

iveness

0.11

vis

0.10

 abilities

0.10

 regarding

0.10

iness

0.09

fulness

0.09

lessness

0.09

 toward

0.09

 towards

0.09

(s

0.09

Activations Density 0.186%