INDEX

Explanations

offensive word detection

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

하거나

0.46

하면서

0.46

여

0.41

 монта

0.40

 있다는

0.40

하지

0.40

Reached

0.39

kadot

0.39

타

0.39

하지만

0.39

POSITIVE LOGITS

 Peralta

0.45

 ethnic

0.44

 redesigned

0.43

 ajustes

0.43

 patrols

0.43

 peine

0.43

 revamped

0.42

 surveillance

0.41

zan

0.41

 Ethnic

0.41

Activations Density 0.001%