INDEX
Explanations
hypothetical fights
The neuron fires most strongly on tokens in “who would win in a fight…”–style question phrases, i.e. it detects the key words and punctuation of fight‐outcome queries.
New Auto-Interp
Negative Logits
824
-0.07
WC
-0.06
ього
-0.06
'ét
-0.06
compact
-0.06
hammer
-0.06
naval
-0.06
yyyyMMdd
-0.06
.epoch
-0.06
getService
-0.06
POSITIVE LOGITS
.Children
0.07
vince
0.07
男
0.06
Kat
0.06
плит
0.06
gener
0.06
landers
0.06
compounded
0.06
portal
0.06
Hib
0.06
Activations Density 0.027%