INDEX

Explanations

refusing harmful requests

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 GitLab

0.43

getTable

0.37

会在

0.37

 الدولي

0.36

greSQL

0.36

 স্বর্গ

0.35

ng

0.35

⟫

0.35

емом

0.34

त्री

0.34

POSITIVE LOGITS

祗

0.37

 Conditioning

0.37

 ব্যা

0.35

ஒ

0.35

音楽

0.34

サラ

0.34

жени

0.34

addListener

0.33

 হাতি

0.33

 Klar

0.33

Activations Density 0.001%