INDEX

Explanations

the word "attack"

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 attack

-3.06

attack

-2.55

 attacks

-2.53

 Attack

-2.53

Attack

-2.31

 attacked

-2.27

 ATTACK

-2.25

 Attacks

-2.16

 attacking

-2.16

 ataque

-2.00

POSITIVE LOGITS

seamnă

0.81

 itſelf

0.61

 forceps

0.54

 similaire

0.54

 throm

0.53

ğim

0.52

 productName

0.52

 igjen

0.52

documentclass

0.49

 apapun

0.49

Activations Density 0.633%