INDEX

Explanations

references to various forms of actions or acts, particularly those associated with violence or moral implications

New Auto-Interp

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 hobbies

-0.70

eff

-0.68

 intrusion

-0.67

acid

-0.65

aples

-0.65

 Fighters

-0.64

 Odyssey

-0.64

bones

-0.64

 implants

-0.63

 appointment

-0.62

POSITIVE LOGITS

EngineDebug

0.90

uates

0.80

rica

0.78

onet

0.77

ilitary

0.72

imaru

0.72

iru

0.72

iable

0.71

icular

0.71

irie

0.71

Activations Density 0.038%