INDEX
Explanations
Super Bowls
The neuron activates on mentions of “Super Bowl” (often with its Roman‐numeral designation).
New Auto-Interp
Negative Logits
fp
-0.07
?url
-0.07
Yar
-0.06
svém
-0.06
Dios
-0.06
尋
-0.06
لی
-0.06
面
-0.06
_ADDRESS
-0.06
Modify
-0.06
POSITIVE LOGITS
Finn
0.07
confirming
0.06
support
0.06
reserv
0.06
شکل
0.06
concerned
0.06
tip
0.06
conflicts
0.06
prosecuted
0.06
Dangerous
0.06
Activations Density 0.005%