INDEX

Explanations

identifying changing dynamics

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

കസ

0.44

সং

0.43

 आरमार

0.43

到时候

0.42

设计

0.42

赎

0.42

是为了

0.42

Preference

0.41

larını

0.41

原则

0.41

POSITIVE LOGITS

 detect

0.94

 suspected

0.89

 detects

0.82

 detectar

0.74

 detecting

0.71

 detection

0.67

 hidden

0.66

 suspicion

0.66

 undetected

0.65

 detected

0.64

Activations Density 0.221%