INDEX

Explanations

questions or phrases related to ethical considerations and societal issues, particularly those involving racism and harmful stereotypes.

if ... were

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

難しい

0.49

 eventuali

0.47

結局

0.45

 liệu

0.45

顰

0.45

ية

0.43

併

0.42

ließlich

0.41

 ஏனெனில்

0.40

ন

0.40

POSITIVE LOGITS

 থাকত

0.50

 olisi

0.45

нови

0.44

 থাকিত

0.42

 وقلنا

0.42

....

0.39

isher

0.39

 were

0.38

就好了

0.38

Were

0.37

Activations Density 0.077%