INDEX

Explanations

replace general concepts

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 lovingly

0.51

 greats

0.48

 стане

0.47

 wasn

0.45

 подру

0.45

াগর

0.44

 playfully

0.44

躏

0.44

 sweetly

0.43

籀

0.43

POSITIVE LOGITS

 abundant

0.68

 corresponding

0.65

 indispensable

0.64

 overlapped

0.63

 certain

0.61

 aforementioned

0.61

Corresponding

0.60

 noises

0.59

 existed

0.58

 abnormal

0.57

Activations Density 0.052%