INDEX
Explanations
Code/data
The neuron spikes on long runs of the same (or very similar) tokens in a row—that is, repeated sequences of identical or nearly identical words.
New Auto-Interp
Negative Logits
melt
-0.08
extensive
-0.07
STATES
-0.06
Uri
-0.06
문자
-0.06
زر
-0.06
phrase
-0.06
gzip
-0.06
arg
-0.06
_other
-0.06
POSITIVE LOGITS
ویژه
0.08
Ù
0.07
discoveries
0.07
ра
0.07
erotiske
0.07
."',
0.06
_attachment
0.06
arsenal
0.06
MongoClient
0.06
playlists
0.06
Activations Density 0.003%