INDEX

Explanations

No Explanations Found

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 оптима

0.41

--

0.39

 citer

0.39

 советы

0.39

 যেভাবে

0.39

Home

0.37

Statements

0.37

日上午

0.37

ें

0.36

傭

0.36

POSITIVE LOGITS

 decayed

0.41

 họ

0.38

 disgrace

0.38

😡

0.38

将其

0.38

 humiliated

0.38

 decidir

0.38

 betrayed

0.38

 decided

0.37

 disgraceful

0.37

Activations Density 0.000%

No Known Activations