INDEX
Explanations
mentions of the human body part "head"
references to the head as a noun in various contexts
New Auto-Interp
Negative Logits
PLA
-0.79
liga
-0.78
ieth
-0.74
issance
-0.67
itives
-0.65
rep
-0.64
abiding
-0.63
ovo
-0.63
ools
-0.62
ols
-0.62
POSITIVE LOGITS
head
3.63
heads
2.76
Head
2.56
Head
2.23
head
2.08
HEAD
2.02
Heads
1.89
heads
1.61
skull
1.59
neck
1.54
Activations Density 0.024%