INDEX

Explanations

release of or releasing harmful

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

രോധ

0.64

 있었

0.56

 போட

0.55

 않았

0.55

е

0.54

perceptron

0.54

modation

0.53

protection

0.53

}}(\

0.53

 ため

0.52

POSITIVE LOGITS

 releasing

1.22

 releases

1.18

释放

1.18

 RELEASE

1.12

 release

1.11

 Release

1.03

解放

1.02

 liberate

0.99

release

0.99

 freed

0.98

Activations Density 0.220%