INDEX

Explanations

under attack or strike

The neuron fires on words signaling violent or aggressive actions—especially attacks, assaults, strikes, or being “under fire.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

Appellee

-0.78

Excessive

-0.72

 ибо

-0.72

ItemName

-0.72

octobre

-0.70

疫

-0.69

 Zeugen

-0.69

leč

-0.68

olat

-0.68

olato

-0.67

POSITIVE LOGITS

 attack

6.00

 attacks

5.59

attack

4.81

 attacked

4.66

Attack

4.34

 Attack

4.31

attacks

4.16

 ATTACK

4.03

 Attacks

3.97

 ataque

3.92

Activations Density 0.096%