INDEX

Explanations

asserting independence or autonomy

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ností

0.43

ências

0.40

 Ки

0.38

 taps

0.37

 Πα

0.37

పె

0.37

тріш

0.37

釋

0.36

益

0.36

 সেপ্ট

0.36

POSITIVE LOGITS

 stubborn

0.70

 autonomy

0.46

 जिद

0.45

 stubbornly

0.44

 autonom

0.43

 autonomous

0.43

autonomous

0.43

 inflexible

0.41

 rebellious

0.41

犟

0.41

Activations Density 0.003%