INDEX
Explanations
The neuron is looking for mentions of a specific name, "Ah"
occurrences of the name "Ah" with varying intensity
New Auto-Interp
Negative Logits
Colossus
-0.74
Daredevil
-0.73
etary
-0.72
Ó
-0.67
Stronghold
-0.66
Collider
-0.65
Purg
-0.65
eering
-0.65
eers
-0.64
Starg
-0.63
POSITIVE LOGITS
ahah
1.02
renheit
0.96
azard
0.93
ghan
0.92
umen
0.91
undai
0.91
resh
0.90
olics
0.89
uates
0.89
oy
0.88
Activations Density 0.014%