INDEX

Explanations

exploiting weaknesses

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 sehen

0.42

lios

0.42

 keperluan

0.42

蓴

0.42

INED

0.41

 اہمیت

0.40

 sockaddr

0.40

 recommandée

0.40

 recommand

0.39

 Пра

0.39

POSITIVE LOGITS

 weaknesses

0.86

 weakness

0.80

 দুর্বল

0.75

weak

0.73

弱

0.71

 Weak

0.68

 weak

0.66

Weak

0.66

 कमजोरी

0.66

 weakly

0.65

Activations Density 0.093%