INDEX

Explanations

state of being harmed or affected

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

渋

0.76

 reunions

0.74

 reunion

0.73

্ধ

0.72

нцикло

0.71

Mental

0.69

煖

0.69

actéristiques

0.69

 मानसिक

0.68

冶

0.68

POSITIVE LOGITS

 attacked

2.26

 bị

1.95

 assaulted

1.82

 disrupted

1.80

 threatened

1.77

 destroyed

1.75

 affected

1.72

 damaged

1.66

 harmed

1.65

遭到

1.62

Activations Density 0.278%