INDEX

Explanations

destroying, destruction, breaking, or smashing

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

છ

0.37

cosx

0.36

戍

0.35

ometrical

0.34

blkid

0.34

icioso

0.34

Goss

0.34

餉

0.34

}.\

0.34

/-}$

0.33

POSITIVE LOGITS

 разру

1.69

 destroyed

1.68

 destruction

1.66

 destroy

1.57

 Destroy

1.57

破壊

1.57

 destruir

1.54

 destroying

1.52

 destru

1.52

 destroys

1.50

Activations Density 0.066%