INDEX

Explanations

fear of negative outcomes

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 באופן

0.80

盒

0.80

 בד

0.80

 طراحی

0.78

 וה

0.78

ollowing

0.78

itectura

0.76

heses

0.75

並

0.75

 تقريبا

0.74

POSITIVE LOGITS

 repercussions

1.44

 ridicule

1.40

 humiliation

1.24

 backlash

1.23

 embarrassment

1.21

 losing

1.15

 betrayal

1.12

 embarrassing

1.12

 disgrace

1.10

 relapse

1.10

Activations Density 0.136%