INDEX

Explanations

influencing or manipulating people

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 chaw

-0.09

 zdję

-0.08

Antwort

-0.08

 гара

-0.08

 велосипед

-0.08

 vatten

-0.08

Svar

-0.08

.ext

-0.08

 پانی

-0.08

उत्तर

-0.08

POSITIVE LOGITS

 persuasion

0.15

 propaganda

0.13

 persuasive

0.12

 manipul

0.12

 manipulate

0.12

 psychological

0.12

心理

0.11

 manipulation

0.11

 deception

0.11

 Manip

0.11

Activations Density 0.064%