INDEX

Explanations

peace, civilization, morality

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

.Initial

-0.08

 үлкен

-0.08

енно

-0.07

 systematic

-0.07

 labai

-0.07

 acknowledged

-0.07

’où

-0.07

 вельмі

-0.07

 zware

-0.07

истем

-0.07

POSITIVE LOGITS

 peaceful

0.09

 civilized

0.09

 morality

0.09

お気

0.08

 morals

0.08

 revolution

0.08

 współ

0.08

 révolution

0.08

反水

0.08

	class

0.08

Activations Density 0.011%