INDEX
Explanations
TV Shows
This neuron activates on the names of TV shows (proper‐noun series titles) mentioned in the text.
New Auto-Interp
Negative Logits
紹介
-0.07
Když
-0.07
prosince
-0.07
zoals
-0.06
drops
-0.06
října
-0.06
ная
-0.06
xbox
-0.06
_g
-0.06
toy
-0.06
POSITIVE LOGITS
Howell
0.06
0.06
aaa
0.06
Costa
0.06
.MSG
0.06
Province
0.06
condos
0.06
CREATE
0.06
FILES
0.06
addresses
0.06
Activations Density 0.003%