INDEX
Explanations
Common English words
This neuron lights up on tokens that belong to titles or section headings (e.g., names of films, books, articles).
New Auto-Interp
Negative Logits
csv
-0.07
twenty
-0.07
dued
-0.07
婆
-0.07
talent
-0.07
hypertension
-0.06
те
-0.06
viewers
-0.06
$/)
-0.06
亮
-0.06
POSITIVE LOGITS
قاب
0.07
riteln
0.06
ürlich
0.06
LOD
0.06
companyId
0.06
Autos
0.06
redefine
0.06
izontally
0.06
mír
0.06
Experience
0.06
Activations Density 0.065%