INDEX

Explanations

is allowed or not allowed

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

0.62

 that

0.60

म

0.57

 notify

0.55

ي

0.55

 afresh

0.54

د

0.54

ש

0.54

멍

0.53

其他

0.52

POSITIVE LOGITS

 permitted

0.84

allowed

0.83

Allowed

0.81

 permitido

0.81

 Allowed

0.80

 erlaub

0.78

permitted

0.77

 اجازه

0.75

 allowed

0.75

 परवानगी

0.75

Activations Density 0.131%