INDEX

Explanations

killing, harming, punishable

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ク

0.50

特

0.49

ч

0.47

編

0.46

ти

0.46

帶

0.44

ᅧ

0.44

Р

0.44

ब

0.43

加

0.43

POSITIVE LOGITS

 امیدوار

0.63

 hopes

0.47

 надеюсь

0.47

 שא

0.46

 southeastern

0.46

 philanthrop

0.46

 corroborate

0.45

 امید

0.45

 रिजल्ट

0.44

Activations Density 0.004%