INDEX

Explanations

divisive and controversial topics

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 appBar

0.71

 बलात्कार

0.66

 стомато

0.63

 শাস্তি

0.63

MyAdmin

0.62

 steals

0.61

 steal

0.61

꿇

0.59

 Buildable

0.59

 বিদ্যা

0.59

POSITIVE LOGITS

 polarization

2.34

 polarized

2.30

 polarisation

2.11

 polarised

2.10

 polar

2.10

 divides

2.08

 polarizing

2.04

 divisions

2.04

 divide

2.01

 Polarization

1.98

Activations Density 0.245%