INDEX

Explanations

No Explanations Found

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

orescence

0.34

他们在

0.33

 జాగ్ర

0.33

 दिलचस्प

0.32

 Конечно

0.32

0.31

 неболь

0.31

Μ

0.30

 Reach

0.30

 অনেকের

0.30

POSITIVE LOGITS

 legitim

0.50

 knowingly

0.49

 disrespectful

0.48

 immoral

0.48

 illicit

0.46

 violate

0.46

任何

0.45

 unethical

0.44

 कोणत्याही

0.43

 violates

0.43

Activations Density 0.856%

No Known Activations