INDEX

Explanations

sexual harassment and misconduct

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

éĢŁåº¦ä¸İ

-0.06

anti

-0.06

urf

-0.06

à¸§à¸Ķ

-0.06

æķ£åİ»

-0.06

ayat

-0.06

çĽ¸æĢĿ

-0.06

å®£ä¼łåįķ

-0.06

æĿĲæĸĻä¸İ

-0.06

å»ºè®®ä½¿çĶ¨

-0.06

POSITIVE LOGITS

ä»¶

0.06

iak

0.05

äººèº«

0.05

æ±Ĥ

0.05

 focusing

0.05

 semiclass

0.05

foc

0.05

çĻĮ

0.05

 univers

0.05

å®³

0.05

Activations Density 0.004%