INDEX

Explanations

attacker information Jewish Harry deception

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 sociais

0.55

 árv

0.54

 божомолдор

0.52

Customize

0.52

ㄾ

0.52

 ženy

0.51

<%@

0.50

粔

0.49

〢

0.49

 atualizar

0.48

POSITIVE LOGITS

↵↵↵

0.43

en

0.42

on

0.42

 Harry

0.42

↵↵

0.42

 deception

0.41

 Companion

0.41

纳米

0.40

0.39

Cas

0.39

Activations Density 0.004%