INDEX

Explanations

shame, humiliation, inadequacy, distress

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

äft

0.71

பதி

0.68

 optimism

0.64

 motivational

0.62

 motivación

0.61

 motivations

0.61

 motivation

0.61

<0x8D>

0.60

 παρά

0.60

 coincidence

0.60

POSITIVE LOGITS

Sch

0.92

 malu

0.76

ŝ

0.75

 shame

0.75

 مردم

0.75

 humiliation

0.74

Gog

0.73

 وفق

0.73

 html

0.72

 Shame

0.72

Activations Density 0.266%