INDEX

Explanations

hate speech

The neuron strongly activates on words naming head coverings (e.g., hat, hood, cap).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 পর্বে

0.59

ితే

0.57

цкі

0.57

 warmer

0.55

物质

0.55

따

0.55

warm

0.55

 тепло

0.55

垍

0.53

ѳ

0.53

POSITIVE LOGITS

HOT

0.59

Vez

0.59

Hat

0.58

クトル

0.57

chet

0.56

Hot

0.54

 Objects

0.53

 CHECK

0.52

 hivyo

0.52

),]),

0.52

Activations Density 0.193%