INDEX

Explanations

harm, injury, and death

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Turbulent

0.37

 பரபர

0.36

 স্থিত

0.36

 Troubleshooting

0.35

نات

0.35

 Efficiency

0.34

满了

0.34

 ინტერ

0.34

裴

0.33

倔

0.32

POSITIVE LOGITS

 death

2.02

死亡

1.87

death

1.81

 deaths

1.80

 muerte

1.77

 kematian

1.73

 смерть

1.70

 DEATH

1.69

 śmierci

1.69

 مرگ

1.67

Activations Density 0.046%