INDEX

Explanations

counterfactuals, ELAB, Facebook, percentage, Life, Boundary

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ب

0.91

ASSI

0.77

mari

0.76

aulay

0.76

Alban

0.76

ق

0.75

CLASSI

0.72

蟬

0.72

ад

0.71

converting

0.71

POSITIVE LOGITS

 helst

0.75

担心

0.74

名为

0.72

 następnie

0.71

 związku

0.70

 nombres

0.70

ηση

0.70

 mineralization

0.70

 nombre

0.69

 nema

0.68

Activations Density 0.001%