INDEX

Explanations

expressions of concern or action related to safety and well-being

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

akis

-0.07

isko

-0.07

êµ¬

-0.07

__[

-0.07

_BANK

-0.07

ÑĥÐ½Ðº

-0.07

Ø¯ÙĬ

-0.07

äº

-0.07

scal

-0.07

_Generic

-0.07

POSITIVE LOGITS

 welfare

0.09

 concern

0.09

 concerns

0.07

 preocup

0.07

 concerned

0.07

 Concern

0.07

 Sick

0.07

 Welfare

0.07

Wor

0.06

 whether

0.06

Activations Density 0.024%