INDEX
Explanations
Conversational writing style
The neuron broadly activates on ordinary lexical tokens—especially common function words and high‐frequency content words—marking generally “normal” text words.
New Auto-Interp
Negative Logits
.icons
-0.07
dney
-0.07
█
-0.07
INDEX
-0.07
.userdetails
-0.07
(clone
-0.07
ً،
-0.06
"), ↵
-0.06
ltra
-0.06
[max
-0.06
POSITIVE LOGITS
casually
0.06
resourceId
0.06
thiên
0.06
โปร
0.06
كت
0.05
hatt
0.05
plagiar
0.05
raisal
0.05
acidic
0.05
SQ
0.05
Activations Density 0.172%