INDEX

Explanations

positive human attributes and skills

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 swapped

0.51

 culprits

0.50

 oors

0.50

 craz

0.49

0.46

 hyped

0.45

 guilty

0.45

 subreddit

0.45

↵

0.45

POSITIVE LOGITS

 తన

0.66

 талант

0.65

 항상

0.60

 skillfully

0.60

 завжди

0.59

 hänen

0.58

他对

0.58

 തന്റെ

0.57

 его

0.57

 impressively

0.57

Activations Density 0.101%