INDEX

Explanations

policy and its contexts

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 lavander

-0.82

ワイイ

-0.81

delle

-0.79

 have

-0.78

 necesitas

-0.73

 schönes

-0.73

MB

-0.73

航天

-0.72

 наблю

-0.71

 alberto

-0.69

POSITIVE LOGITS

 Policy

1.01

 enforced

0.99

 Policies

0.98

"},

0.98

аа

0.97

 policy

0.96

 påvir

0.95

Pol

0.92

 menangkap

0.91

 برای

0.90

Activations Density 0.011%