INDEX

Explanations

titles or roles associated with occupations or positions of authority

New Auto-Interp

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

pires

-0.65

ornia

-0.64

izoph

-0.62

uras

-0.62

icated

-0.62

iable

-0.61

ABLE

-0.61

rays

-0.61

ray

-0.60

¯¯¯¯¯¯¯¯

-0.60

POSITIVE LOGITS

ullah

0.70

 understands

0.67

 resent

0.66

 frown

0.66

 forg

0.63

 gladly

0.63

 disagreed

0.62

 disagrees

0.62

 flips

0.62

 forgot

0.60

Activations Density 0.572%