INDEX

Explanations

be clear or honest

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 बरसा

0.71

σκευ

0.70

perties

0.67

味わ

0.66

صی

0.65

membered

0.65

缀

0.65

 पर्व

0.64

 закреп

0.64

围

0.63

POSITIVE LOGITS

 honest

2.09

 honesty

1.93

 Honest

1.81

honest

1.78

 transparency

1.63

 truthful

1.55

 transparent

1.52

 transparence

1.49

 Transparency

1.47

Transparency

1.42

Activations Density 0.603%