INDEX

Explanations

bribery and corruption

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 unpredict

0.45

 angr

0.45

召唤

0.45

 रोमांच

0.43

 Robot

0.43

 Tough

0.42

继

0.42

倔

0.42

axial

0.41

 reactance

0.41

POSITIVE LOGITS

 corruption

1.38

 corrupt

1.36

 corrupción

1.31

 improper

1.21

 embezzlement

1.19

corruption

1.15

 fraudulent

1.14

 bribery

1.13

 questionable

1.12

 shady

1.12

Activations Density 0.056%