INDEX

Explanations

how things change or work

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

мери

0.49

야

0.49

 लाज

0.46

bete

0.45

ෙ

0.44

יין

0.44

૧

0.44

뷰

0.43

风险

0.43

 assurer

0.43

POSITIVE LOGITS

 artificially

0.64

 experiment

0.62

 stimuli

0.61

 changed

0.61

 injected

0.59

 experimental

0.59

 manipulated

0.59

 stimulus

0.57

 increased

0.57

 interventions

0.56

Activations Density 0.256%